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TO THE INSTRUCTOR: About This Book 


elcome to the seventh edition of The Basic Practice of Statistics. As the name 

suggests, this text provides an introduction to the practice of statistics that 

aims to equip students to carry out common statistical procedures and to 
follow statistical reasoning in their fields of study and in their future employment. 

The Basic Practice of Statistics is designed to be accessible to college and university 
students with limited quantitative background—just “algebra” in the sense of being 
able to read and use simple equations. It is usable with almost any level of technology 
for calculating and graphing—from a $15 “two-variable statistics” calculator through a 
graphing calculator or spreadsheet program through full statistical software. Of course, 
graphs and calculations are less tedious with good technology, so we recommend mak- 
ing available to your students the most effective technology that circumstances permit. 

Despite the lower mathematical level, The Basic Practice of Statistics is designed 
to reflect the actual practice of statistics, where data analysis and design of data 
production join with probability-based inference to form a coherent science of data. 
There are good pedagogical reasons for beginning with data analysis (Chapters 1 to 
7), then moving to data production (Chapters 8 to 11), and then to probability and 
inference (Chapters 12 to 27). In studying data analysis, students learn useful skills 
immediately and get over some of their fear of statistics. Data analysis is a neces- 
sary preliminary to inference in practice, because inference requires clean data. 
Designed data production is the surest foundation for inference, and the deliberate 
use of chance in random sampling and randomized comparative experiments 
motivates the study of probability in a course that emphasizes data-oriented statis- 
tics. The Basic Practice of Statistics gives a full presentation of basic probability and 
inference (16 of the 27 chapters) but places it in the context of statistics as a whole. 


Guiding Principles and the GAISE Guidelines 


The Basic Practice of Statistics is based on three principles: balanced content, experi- 
ence with data, and the importance of ideas. These principles are widely accepted 
by statisticians concerned about teaching and are directly connected to and re- 
flected by the themes of the College Report of the Guidelines in Assessment and 
Instruction for Statistics Education (GAISE) Project. 

The GAISE Guidelines include six recommendations for the introductory sta- 
tistics course. The content, coverage, and features of The Basic Practice of Statistics 
are closely aligned to these recommendations: 


1. Emphasize statistical literacy and develop statistical thinking. The intent of The 
Basic Practice of Statistics is to be modern and accessible. The exposition is straight- 
forward and concentrates on major ideas and skills. One principle of writing for be- 
ginners is not to try to tell your students everything you know. Another principle is 
to offer frequent stopping points, marking off digestible bites of material. Statistical 
literacy is promoted throughout The Basic Practice of Statistics in the many examples 
and exercises drawn from the popular press and from many fields of study. Statisti- 
cal thinking is promoted in examples and exercises that give enough background to 
allow students to consider the meaning of their calculations. Exercises often ask for 
conclusions that are more than a number (or “reject Hy”). Some exercises require 
judgment in addition to right-or-wrong calculations and conclusions. Statistics, more 
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than mathematics, depends on judgment for effective use. The Basic Practice of Statistics 
begins to develop students’ judgment about statistical studies. 


2. Use real data. The study of statistics is supposed to help students work with data 
in their varied academic disciplines and in their unpredictable later employment. 
Students learn to work with data by working with data. The Basic Practice of Statistics 
is full of data from many fields of study and from everyday life. Data are more than 
mere numbers—they are numbers with a context that should play a role in making 
sense of the numbers and in stating conclusions. Examples and exercises in The Basic 
Practice of Statistics, though intended for beginners, use real data and give enough 
background to allow students to consider the meaning of their calculations. 


3. Stress conceptual understanding rather than mere knowledge of procedures. 
A first course in statistics introduces many skills, from making a stemplot and cal- 
culating a correlation to choosing and carrying out a significance test. In practice 
(even if not always in the course), calculations and graphs are automated. Moreover, 
anyone who makes serious use of statistics will need some specific procedures not 
taught in their college statistics course. The Basic Practice of Statistics therefore tries to 
make clear the larger patterns and big ideas of statistics, not in the abstract, but in 
the context of learning specific skills and working with specific data. Many of the big 
ideas are summarized in graphical outlines. Three of the most useful appear inside the 
front cover. Formulas without guiding principles do students little good once the final 
exam is past, so it is worth the time to slow down a bit and explain the ideas. 


4. Foster active learning in the classroom. Fostering active learning is the busi- 
ness of the teacher, though an emphasis on working with data helps. To this end, 
we have created interactive applets to our specifications and made them available 
online. These are designed primarily to help in learning statistics rather than in 
doing statistics. We suggest using selected applets for classroom demonstrations 
even if you do not ask students to work with them. The Correlation and Regression, 
Confidence Intervals, and P-value of a Test of Significance applets, for example, convey 
core ideas more clearly than any amount of chalk and talk. 

We also provide web exercises at the end of each chapter Our intent is to take advan- 
tage of the fact that most undergraduates are “web savvy.” These exercises require stu- 
dents to search the web for either data or statistical examples and then evaluate what they 
find. Teachers can use these as classroom activities or assign them as homework projects. 


5. Use technology for developing conceptual understanding and analyzing 
data. Automating calculations increases students’ ability to complete problems, re- 
duces their frustration, and helps them concentrate on ideas and problem recogni- 
tion rather than mechanics. At a minimum, students should have a “two-variable 
statistics” calculator with functions for correlation and the least-squares regression 
line as well as for the mean and standard deviation. 

Many instructors will take advantage of more elaborate technology, as ASA/ 
MAA and GAISE recommend. And many students who don’t use technology in 
their college statistics course will find themselves using (for example) Excel on the 
job. The Basic Practice of Statistics does not assume or require use of software except 
in Part V, where the work is otherwise too tedious. It does accommodate software 
use and tries to convince students that they are gaining knowledge that will enable 
them to read and use output from almost any source. There are regular “Using 
Technology” sections throughout the text. Each of these sections displays and com- 
ments on output from the same three technologies, representing graphing calcula- 
tors (the Texas Instruments TI-83 or TI-84), spreadsheets (Microsoft Excel), and 
statistical software (JMP Minitab, and Crunchlt!). The output always concerns 
one of the main teaching examples, so that students can compare text and output. 
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6. Use assessments to improve and evaluate student learning. Within chapters, 
a few “Apply Your Knowledge” exercises follow each new idea or skill for a quick 
check of basic mastery—and also to mark off digestible bites of material. Each of 
the first four parts of the book ends with a review chapter that includes a point-by- 
point outline of skills learned, problems students can use to test themselves, and 
several supplementary exercises. (Instructors can choose to cover any or none of 
the chapters in Part V, so each of these chapters includes a skills outline.) The re- 
view chapters present supplemental exercises without the “I just studied that” con- 
text, thus asking for another level of learning. We think it is helpful to assign some 
supplemental exercises. Many instructors will find that the review chapters appear 
at the right points for pre-examination review. The “Test Yourself” questions can 
be used by students to review, self-assess, and prepare for such an examination. 

In addition, assessment materials in the form of a test bank and quizzes are avail- 
able online. 


What’s New? 


The new edition of The Basic Practice of Statistics brings many new examples and 
exercises. There are new data sets from a variety of sources, including finance 
(the relationship between positive articles in the media and the Dow Jones Indus- 
trial Average the following week), health (the relationship between salt intake and 
percent body fat of children), psychology (the relationship between one’s attitude 
about a presidential candidate and how trustworthy the candidate’s face appears 
to be), medicine (the relationship between playing video games and surgical skills), 
and the environment (global temperatures). Popular examples and exercises such 
as the Florida manatee regression example return, many with updated data. These 
are just a few of a large number of new data settings in this edition. 

A new edition is also an opportunity to introduce new features and polish the 
exposition in ways intended to help students learn. Here are some of the changes: 


Each chapter now contains references to online resources to enhance student 
learning. These include video clips, whiteboard lectures, and technology supplements. 


We have added an introductory chapter, “Getting Started,” that instructors 
may wish to assign to students the first day of classes. This chapter provides an 
overview of statistical thinking and real examples where the use of statistics can 
provide valuable insight. It expands on material that was previously included in 
the Preface, adding motivating examples and exercises. 


Chapter 7 includes descriptions of additional data sets available online that instructors 
can use for student projects and more extensive data analysis. Along with the 
description of the data sets, we provide a few suggestions for how they might be used. 


We have added some basic material on resampling and permutation tests in 
optional sections at the end of Chapters 15, 17, 20, and 21. We hope that 
instructors who want to introduce students to resampling methods will find this 
new material useful. 


The essay on data ethics is now Chapter 10, and follows the format of other 
chapters in the book. 


We have added output from JMP to the “Using Technology” sections. 


The content in Parts I and II has been rewritten to accommodate instructors who 
prefer to teach data production (Part II) before data exploration (Part I). Instructors 
can teach these parts in either order while maintaining the continuity of the material. 


Sections are now numbered for easier reference. 
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In this chapter we cover... 


ach chapter opener offers a brief overview of where 
the chapter is heading, often with reference to pre- 
vious chapters, and includes a section outline of the 


In this chapter 
we cover... 


74) 
2.2 
2.3 


2.4 


Measuring center: the mean 
Measuring center: the median 


Comparing the mean and the 
median 

Measuring variability: 

the quartiles 


major topics that will be covered. 2.5 The five-number summary 


FIGURE 2.5 

Stemplots comparing the distribu- 
tions of graduation rates for the four 
census regions from Table 1.1, for 
Example 2.9. 


(STATE?) Federal law requires all states in the United States to use a common com- 
putation of on-time high school graduation rates beginning with the 2010-11 school 
year. Previously, states chose one of several computation methods that gave answers 
that could differ by more than 10%, This common computation allows for meaningful 
comparison of graduation rates between the states. 

We know from Table 1.1 (page 22) that the on-time high school graduation rates 
varied from 59% in the District of Columbia to 88% in lowa. The U.S. Census Bureau 
divides the 50 states and the District of Columbia into four geographical regions: the 
Northeast (NE), Midwest (MW), South (S), and West (W). The region for each state is 
included in Table 1.1. Do the states in the four regions of the country display distinct 
distributions of graduation rates? How do the mean graduation rates of the states in 
each of these regions compare? 


(BQAN) Use graphs and numerical descriptions to describe and compare the distri- 
butions of on-time high school graduation rates of the states in the four regions of 
the United States. 


ESRI We might use boxplots to compare the distributions, but stemplots pre- 
serve more detail and work well for data sets of these sizes. Figure 2.5 displays the 
stemplots with the stems lined up for easy comparison. The stems have been split 
to better display the distributions. The stemplots overlap, and some care is needed 
when comparing the four stemplots as the sample sizes differ, with some stemplots 
having more leaves than others. None of the plots shows strong skewness, although 
the South has one low observation that stands apart from the others with this choice 
of stems. The states in the Northeast and Midwest have distributions that are similar 
to each other, as do those in the South and West. The graduation rates tend to be 
higher for the states in the Northeast and Midwest and more variable for the states 
in the South and West. With little skewness and no serious outliers, we report ¥ and 
as our summary measures of center and variability of the distribution of the on-time 
graduation rates of the states in each region: 


Region Mean Standard Deviation 
Midwest 82.92 4.25 
Northeast 82.56 3.47 
South 75.93 7.36 
West 73.58 6.73 


Northeast 
& 
6 
7 
7 
6 
6 
5 


[SSSR The table of summary statistics confirms what we see in the stem- 
plots. The states in the Midwest and Northeast are quite similar to each other, as are 
those in the South and West. The states in the Midwest and Northeast have a higher 
mean graduation rate as well as a smaller standard deviation than those in the South 
and West. 


Apply Your Knowledge 


Major concepts are immediately reinforced with 
problems that are interspersed throughout the 
chapter (often following examples). These prob- 
lems allow students to practice their skills concur- 
rently as they work through the text. 


XIV 


and boxplots 


2.6 Spotting suspected outliers 
and modified boxplots* 


2.7 Measuring variability: the 
standard deviation 


2.8 Choosing measures of center 
and variability 


2.9 Using technology 


Organizing a statistical 
problem 


4-Step Examples 


In Chapter 2, students learn how to use the four- 
step process for working through statistical prob- 
lems: State, Plan, Solve, Conclude. By observing this 
framework in use in selected examples throughout 
the text and practicing it in selected exercises, stu- 
dents develop the ability to solve and write reports 
on real statistical problems encountered outside the 
classroom. 
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2.10 X and s by Hand. Radon is a naturally occurring gas and is the second 
leading cause of lung cancer in the United States.’ It comes from the natural 
breakdown of uranium in the soil and enters buildings through cracks and 
other holes in the foundations. Found throughout the United States, levels 
vary considerably from state to state. Several methods can reduce the levels 
of radon in your home, and the Environmental Protection Agency recom- 
mends using one of these if the measured level in your home is above 4 pico- 
curies per liter. Four readings from Franklin County, Ohio, where the county 
average is 8.4 picocuries per liter, were 6.2, 12.8, 7.6, and 15.4. 

(a) Find the mean step-by-step. That is, find the sum of the four observations 
and divide by 4. 


(b) Find the standard deviation step-by-step. That is, find the deviation 
of each observation from the mean, square the deviations, then 
obtain the variance and the standard deviation. Example 2.7 shows 
the method. 


Now enter the data into your calculator and use the mean and standard 
deviation buttons to obtain X and s, Do the results agree with your hand 
calculations? 


FEATURES OF THE BASIC PRACTICE OF STATISTICS Seventh Edition 


() LaunchPad Online Resources 


Many sections end with references to the most relevant 
and helpful online resources (chosen by the authors and 
available in LaunchPad) for students to use for further 
explanation or practice. 


Using Technology 


(9) LaunchPad Online Resources 


* The Snapshots video, Summarizing Quantitative Data, provides an overview 
of the need for measures of center and variability as well as some details of 
the computations. 


* The StatClips Examples video, Summaries of Quantitative Data Example 


C, gives the details for the computation of the mean, median, and standard 
deviation in a small example. You can verify the computations along with the 
video, either by hand or using your technology. 


* The StatClips Examples videos, Basic Principles of Exploring Data Example 
B and Basic Principles of Exploring Data Example C, emphasize the need to 
examine outliers and understand them, rather than simply discarding obser- 
vations that don’t seem to fit. 


Located where most appropriate, these special sections display and comment on 
the output from graphing calculators, spreadsheets, and statistical software in the 
context of examples from the text. 


Although a calculator with “two-variable statistics” functions will do the basic 
calculations we need, more elaborate tools are helpful. Graphing calculators and 
computer software will do calculations and make graphs as you command, freeing 
you to concentrate on choosing the right methods and interpreting your results. 
Figure 2.4 displays output describing the travel times to work of 20 people in New 
York State (Example 2.3). Can you find X, s, and the fivenumber summary in each 
output? The big message of this section is: Once you know what to look for, you can 
read output from any technological tool. 

The displays in Figure 2.4 come from a Texas Instruments gra 
tor, the Minitab, Crunchlt!, and JMP statistical programs, and the Microsoft 
Excel spreadsheet program. Minitab and JMP allow you to choose what descrip- 
tive measures you want, whereas the descriptive measures in the CrunchIt! out 
put are provided by default. Excel and the calculator give some things we don't 
need. Just ignore the extras. Excel’s “Descriptive Statistics” menu item doesn't 
give the quartiles. We used the spreadsheet’s separate quartile function to get 


Q; and Q;. 


phing calcula- 


Texas Instruments Graphing Calculator 
i-Var Stats 
2731,25 
3x=625 
3x2=28625 
Sx=21.8773495 
ox=21, 32346264 
4n=26 


Descriptive Statistics: NYtime 


Total 


variable Count Mean StDev Variance Minimum 
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Microsoft Excel 
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FIGURE 2.4 

Output from a graphing calculator, three 
Statistical software packages, and a 
spreadsheet program describing the 
data on travel times to work in New 
York State. 
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HE SAID, 


SHE SAID. 
Height, weight, and body mass 
distributions in this book come from 
actual measurements by a government 
survey. That is a good thing. When 
asked their weight, almost all women 


say they weigh less than they really do. 

Heavier men also underreport their 

weight—but lighter men claim to 

weigh more than the scale shows. We 

leave you to ponder the psychology 

of the two sexes. Just remember that 

& ay : ; 
say-so” is no substitute for measuring. 


CHAPTER 4 SUMMARY 


Chapter Specifics 


m To study relationships between variables, we must measure the variables on the same 
group of individuals. 

m If we think that a variable x may explain or even cause changes in another variable y, 
we call x an explanatory variable and y a response variable. 

= Ascatterplot displays the relationship between two quantitative variables measured on 
the same individuals. Mark values of one variable on the horizontal axis (x axis) and 
values of the other variable on the vertical axis (y axis). Plot each individual’s data as 
a point on the graph. Always plot the explanatory variable, if there is one, on the x axis 
of a scatterplot. 

g Plot points with different colors or symbols to see the effect of a categorical variable in 
a scatterplot. 

mg Inexamining a scatterplot, look for an overall pattern showing the direction, form, and 
strength of the relationship and then for outliers or other deviations from this pattern. 

a Direction: If the relationship has a clear direction, we speak of either positive associa- 
tion (high values of the two variables tend to occur together) or negative association 
(high values of one variable tend to occur with low values of the other variable). 

= Form: Linear relationships, where the points show a straight-line pattern, are an. 

important form of relationship between two variables. Curved relationships and clus- 

ters are other forms to watch for. 


Strength: The strength of a relationship is determined by how close the points in the 
scatterplot lie to a simple form such as a line. 

m= The correlation r measures the direction and strength of the linear association 
between two quantitative variables x and y. Although you can calculate a correlation 
for any scatterplot, r measures only straight-line relationships. 


Check Your Skills and 


Chapter Exercises 


Each chapter ends with a series of multiple- 
choice problems that test students’ understand- 
ing of basic concepts and their ability to apply 
the concepts to real-world statistical situations. 
The multiple-choice problems are followed by a 
set of more in-depth exercises that allow students 
to make judgments and draw conclusions based 
on real data and real scenarios. 


Link It 


In Chapters 1 to 3, we focused on exploring features of a single variable. In this chapter, we 
continued our study of exploratory data analysis but for the purpose of examining relation- 
ships between variables. A useful tool for exploring the relationship between two variables 
is the scatterplot. When the relationship is linear, correlation is a numerical measure of the 
strength of the linear relationship. 

It is tempting to assume that the patterns we observe in our data hold for values of our 
variables that we have not observed—in other words, that additional data would continue 
to conform to these patterns. The process of identifying underlying patterns would seem 
to assume that this is the case. But is this assumption justified? Parts II to V of the book 
answer this question. 


Statistics in Your World 


These brief asides in each chapter illustrate major concepts 
or present cautionary tales through entertaining and relevant 
stories, allowing students to take a break from the exposition 
while staying engaged. 


4.14 Researchers collect data on 5,134 American adults 


younger than 60, They measure the reaction times (in 

seconds) of each subject to a stimulus on a computer 

screen and how many years later the subject died.'° 
The researchers are interested in whether reaction 

time can predict time to death (in years). When you 

make a scatterplot, the explanatory variable on the x axis 

(a) is the reaction time. 

(b) is the time to death. 

(c) can be either reaction time or time to death. 

The researchers in Exercise 4.14 found that people 

with slower reaction times tended to die sooner. In 

a scatterplot of the reaction time and the number of 

years to death, you expect to see 

(a) a positive association. 

(b) very little association. 

(c) a negative association. 

Figure 4.7 is a scatterplot of school GPA against IQ 

test scores for 15 seventh-grade students. There is one 

low outlier in the plot. The IQ and GPA scores for this 

student are 

(a) IQ = 0.5, GPA = 103. 

(b) IQ = 103, GPA = 0.5. 

(c) IQ = 103, GPA = 7.6. 

If we leave out the low outlier, the correlation for the 

remaining 14 points in Figure 4.7 is closest to 


(a) 0.9. (b) -0.9. (©) OL. 


major golf tournaments. Figure 4.8 is a scatterplot of 
the scores for the first two rounds of the 2013 Masters 
for all the golfers entered. Only the 60 golfers with 
the lowest two-round total advance to the final two 
rounds (unless several people are tied for 60th place, 
in which case all those tied for 60th place advance). 
The plot has a grid pattern because golf scores must 
be whole numbers.'* MASTR13 


(a) Read the graph: What was the lowest score in 
the first round of play? How many golfers had this 
low score? What were their scores in the second 
round? 

(b) Read the graph: Alan Dunbar had the highest 
score in the second round. What was this score? 
What was Dunbar’s score in the first round? 


4.18 


Chapter Summary and Link It 


Each chapter concludes with a summary of the chapter specifics, 
including major terms and processes, followed by a brief discus- 
sion of how the chapter links to material from both previous and 
upcoming chapters. 


| CHECK YOUR SKILLS 


What are all the values that a correlation r can pos- 
sibly take? 


(a)r2=0 (bt) OS r=1 


If the correlation between two variables is close to 0, 


() -lsr<1 


you can conclude that a scatterplot would show 

(a) a strong straight-line pattern. 

(b) a cloud of points with no visible pattern. 

(c) no straight-line pattern, but there might be a strong 
pattern of another form. 

The points on a scatterplot lie very close to a straight 
line. The correlation between x and y is close to 

(a) —1. (b) 1. (©) either —1 or 1, we can'tsay which. 
A statistics professor warns her class that her second 
midterm is always harder than the first. She tells her 
class that students always score 10 points worse on the 
second midterm compared to their score on the first 
midterm. This means that the correlation between 
students’ scores on the first and second exam is 

(a) 1. (b) -1. (©) Can't tell without seeing the data. 
Researchers asked mothers how much soda (in ounces) 
their kids drank in a typical day. They also asked 
these mothers to rate how aggressive their kids were 
ona scale of | to 10, with larger values corresponding 
to a greater degree of aggression." The correlation 
between amount of soda consumed and aggression 
rating was found to be r = 0.3. If the researchers had 
measured amount of soda consumed in liters instead 


| CHAPTER 4 EXERCISES 


4.24 Scores at the Masters. The Masters is one of the four 


4.25 Happy states. Human happiness or well-being can be 


assessed either subjectively or objectively. Subjective 
assessment can be accomplished by listening to what 
people say. Objective assessment can be made from 
data related to well-being such as income, climate, 
availability of entertainment, housing prices, lack of 
traffic congestion, and so on. Do subjective and ob- 
jective assessments agree? To study this, investiga- 
tors made both subjective and objective assessments 
of happiness for each of the 50 states. The subjective 
measurement was the mean score on a life-satisfaction 
question found on the Behavioral Risk Factor Surveil- 
lance System (BRFSS), which is a state-based system of 
health surveys. Lower scores indicate a greater degree 
of happiness. To objectively assess happiness, the in- 
vestigators computed a mean well-being score (called 
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Web Exercises 


A final set of exercises asks students to investigate data and statistical issues by 
researching topics online. These exercises tend to be more involved and provide 
an opportunity for students to dig deep into contemporary issues and special ap- 
plications of statistics. 


Exploring the Web 


6.34 Promoting women. In academics, faculty typically start as assistant professors, 
are promoted to associate professor (and gain tenure), and finally reach the rank of 
full professor. Some have argued that women have a harder time gaining promotion 
to associate and full professor than do men. Do data support this argument? Search 
the web to find the number of faculty by rank and gender at some university. Do 
you see a pattern that suggests that the proportion of women decreases as rank 
increases? We found several sources of data by doing a Google search on “faculty 
head count by rank and gender.” In addition to discussing the pattern you find, 
provide the data, the name of the school, and the source of the data. 


Accidental deaths and age. Accidental deaths are shocking and tragic. Do the 
ways in which people die by accident change with age? Look at the most recent 
Statistical Abstract of the United States (www.census.gov/compendia/ 
statab/) and make a two-way table that provides the counts of deaths due 
to accidents from various causes for three different age groups. What do you 
conclude? 


Simpson’s paradox. Find an example of Simpson’s paradox and discuss 
how your example illustrates the paradox. Two examples that we found (thanks 
to Patricia Humphrey at Georgia Southern University) are www.nytimes. 
com/2006/07/15/education/15report .html and online.wsj.com/ 
article/SB125970744553071829.html. 


Online Data for Additional Analyses 


References to larger data sets are suggested in Chapter 7 to provide an opportunity for 
students to apply the methods of Chapters 1-6 to explore data on their own. This is 
intended to reinforce the idea of exploratory data analysis as a tool for exploring data. 


Online Data for Additional Analyses 


1. SAT, ACT, and teacher salaries for 2013 for each of the 50 states and the District of 
Columbia are available in the data set SATACT. One could use these data to carry 
out analyses for ACT scores similar to those for the SAT scores in Chapters 5 and 6. 


For example, repeat the analyses in Exercises 5.50 (page 157) and 5.51 (page 158) 


using ACT scores instead of SAT scores. SATACT 


The data set MLB contains hitting, pitching, fielding, sal 
mance data from the 2013 season for all major league ba: 


ary, and win—loss perfor- 
seball teams. These data 


can be used to determine the correlation between payroll and winning percent- 


age. One can also explore what variables are most highly 


correlated with winning 


percentage, and whether variables that measure pitching performance are more 


highly correlated with winning percentage than variab! 


les that measure hitting 


performance. For example, calculate the correlation between winning percentage 


and number of home runs, between winning percentage 
tween winning percentage and ERA, between winning p: 
by pitchers, and between winning percentage and payroll 


and batting average, be- 
ercentage and strikeouts 


. Which has the highest 


correlation? These data are from http: //www.baseball-reference. 


com/. Visit this website for definitions of several of t 


Historical temperature data and whether Punxsutawney 


he variables in the data 


Phil saw his shadow are 


available in the data set PHIL. Repeat the analysis in Exercise 6.32 (page 177), 
but define what constitutes “six more weeks of winter-like weather” differently. 
For example, you might decide there were six more weeks of winter-like weather if 


average temperatures for March were at least one degree below historical averages. 


Data from the Ohio Department of Health website are available in the pdf 
“20130OHH Detail Tables.” This is a source of many tables that can be used for 


further analyses using methods discussed in Chapter 6. For example, conduct an 
analysis like that in Exercise 7.52 to investigate the relationship between sex and 
strategies about weight (Question 67 in the Tables). HEALTH 


The data set WHAT contains three variables and 3848 observations on each. At 
one time, this was considered a large data set and difficult to explore with software. 
Use various exploratory methods available in software packages such as JMP and 
Minitab to find the “hidden pattern” in these data. £jy)) wat 
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Why Did You Do That? 


There is no single best way to organize our presentation of statistics to beginners. 
That said, our choices reflect thinking about both content and pedagogy. Here are 
comments on several “frequently asked questions” about the order and selection of 
material in The Basic Practice of Statistics. 


e Why does the distinction between population and sample not appear in Part I? 
There is more to statistics than inference. In fact, statistical inference is appropriate 
only in rather special circumstances. The chapters in Part I present tools and tactics 
for describing data—any data. These tools and tactics do not depend on the idea of 
inference from sample to population. Many data sets in these chapters (for example, 
the several sets of data about the 50 states) do not lend themselves to inference be- 
cause they represent an entire population. John Tukey of Bell Labs and Princeton, the 
philosopher of modern data analysis, insisted that the population—sample distinction 
be avoided when it is not relevant. He used the word “batch” for data sets in general. 
We see no need for a special word, but we think Tukey was right. 


¢ Why not begin with data production? We prefer to begin with data exploration 
(Part I), as most students will use statistics mainly in settings other than planned 
research studies in their future employment. We place the design of data produc- 
tion (Part II) after data analysis to emphasize that data-analytic techniques apply 
to any data. However, it is equally reasonable to begin with data production—the 
natural flow of a planned study is from design to data analysis to inference. Because 
instructors have strong and differing opinions on this question, these two topics are 
now the first two parts of the book, with the text written so that it may be started 
with either Part I or Part II while maintaining the continuity of the material. 


¢ Why do Normal distributions appear in Part I? Density curves such as the Nor- 
mal curves are just another tool to describe the distribution of a quantitative variable, 
along with stemplots, histograms, and boxplots. Professional statistical software offers 
to make density curves from data just as it offers histograms. We prefer not to sug- 
gest that this material is essentially tied to probability, as the traditional order does. 
And we find it helpful to break up the indigestible lump of probability that troubles 
students so much. Meeting Normal distributions early does this and strengthens the 
“probability distributions are like data distributions” way of approaching probability. 


e Why not delay correlation and regression until late in the course, as was 
traditional? The Basic Practice of Statistics begins by offering experience working 
with data and gives a conceptual structure for this nonmathematical but essential 
part of statistics. Students profit from more experience with data and from seeing 
the conceptual structure worked out in relations among variables as well as in 
describing single-variable data. Correlation and least-squares regression are very 
important descriptive tools and are often used in settings where there is no popula- 
tion-sample distinction, such as studies of all of a firm’s employees. Perhaps most 
important, the approach taken by The Basic Practice of Statistics asks students to 
think about what kind of relationship lies behind the data (confounding, lurking 
variables, association doesn’t imply causation, and so on), without overwhelming 
them with the demands of formal inference methods. Inference in the correlation 
and regression setting is a bit complex, demands software, and often comes right at 
the end of the course. We find that delaying all mention of correlation and regres- 
sion to that point means that students often don’t master the basic uses and proper- 
ties of these methods. We consider Chapters 4 and 5 (correlation and regression) 
essential and Chapter 26 (regression inference) optional. 
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¢ Why use the z procedures for a population mean to introduce the reasoning 
of inference? This is a pedagogical issue, not a question of statistics in practice. 
The two most popular choices for introducing inference are zy for a mean and z 
for a proportion. (Another option is resampling and permutation tests. We have 
included material on these topics, but have not used them to introduce inference.) 


We find z for means quite accessible to students. Positively, we can say up front 
that we are going to explore the reasoning of inference in the overly simple setting 
described in the box on page 374 titled Simple Conditions for Inference about a 
Mean. As this box suggests, exactly Normal population and true simple random 
sample are as unrealistic as known o. All the issues of practice—robustness against 
lack of Normality and application when the data aren’t an SRS as well as the need 
to estimate o—are put off until, with the reasoning in hand, we discuss the practi- 
cally useful t procedures. This separation of initial reasoning from messier practice 
works well. 

Negatively, starting with inference for p introduces many side issues: no exact 
Normal sampling distribution, but a Normal approximation to a discrete distribu- 
tion; use of p in both the numerator and denominator of the test statistic to esti- 
mate both the parameter p and p’s own standard deviation; loss of the direct link 
between test and confidence interval; and the need to avoid small and moderate 
sample sizes because the Normal approximation for the test is quite unreliable. 

There are advantages to starting with inference for p. Starting with z for means 
takes a fair amount of time and the ideas need to be rehashed with the introduc- 
tion of the t procedures. Many instructors face pressure from client departments 
to cover a large amount of material in a single semester. Eliminating coverage of 
the “unrealistic” z for means with known variance enables instructors to cover 
additional, more realistic applications of inference. Also, many instructors believe 
that proportions are simpler and more familiar to students than means. For instruc- 
tors who would prefer to introduce inference with z for a proportion, we recom- 
mend our book, Statistics in Practice. 


¢ Why didn’t you cover Topic X? Introductory texts ought not to be encyclope- 
dic. We chose topics on two grounds: they are the most commonly used in practice, 
and they are suitable vehicles for learning broader statistical ideas. Students who 
have completed the core of the book, Chapters 1 to 12 and 15 to 24, will have little 
difficulty moving on to more elaborate methods. Chapters 25 to 27 offer a choice of 
slightly more advanced topics, as do the four companion chapters available online. 
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MEDIA AND SUPPLEMENTS 


) LaunchPad 


W.H. Freeman’s new online homework system, LaunchPad, offers our quality content 
curated and organized for easy assignability in a simple but powerful interface. We’ve 
taken what we’ve learned from thousands of instructors and hundreds of thousands of 
students to create a new generation of W. H. Freeman/Macmillan technology. 


Curated Units. Combining a curated collection of videos, homework sets, tutori- 
als, applets, and e-Book content, LaunchPad’s interactive units give instructors a 
building block to use as is or as a starting point for customized learning units. A ma- 
jority of exercises from the text can be assigned as online homework, including an 
abundance of algorithmic exercises. An entire unit’s worth of work can be assigned 
in seconds, drastically reducing the amount of time it takes for instructors to have 
their course up and running. 


Easily customizable. Instructors can customize the LaunchPad units by adding 
quizzes and other activities from our vast wealth of resources. They can also add 
a discussion board, a dropbox, and RSS feed, with a few clicks. LaunchPad allows 
instructors to customize students’ experience as much or as little as desired. 


Useful analytics. The gradebook quickly and easily allows instructors to look up 
performance metrics for classes, individual students, and individual assignments. 


Intuitive interface and design. The student experience is simplified. Students’ 
navigation options and expectations are clearly laid out at all times, ensuring they 
can never get lost in the system. 


Assets integrated into LaunchPad include 
the following: 


Interactive e-Book. Every LaunchPad e-Book comes with powerful study tools 
for students, video and multimedia content, and easy customization for instructors. 
Students can search, highlight, and bookmark, making it easier to study and access 
key content. And teachers can ensure that their classes get just the book they want 
to deliver: customize and rearrange chapters, add and share notes and discussions, 
and link to quizzes, activities, and other resources. 


LEARNINGCurve LearningCurve provides students and instructors with pow- 
erful adaptive quizzing, a game-like format, direct links to the e-Book, and instant 
feedback. The quizzing system features questions tailored specifically to the text 
and adapts to students’ responses, providing material at different difficulty levels 
and topics based on student performance. 


¥ SolutionMaster SolutionMaster offers an easy-to-use web-based version of 
the instructor’s solutions, allowing instructors to generate a solution file for any set 
of homework exercises. 


MEDIA AND SUPPLEMENTS 


New StatBoards videos are brief whiteboard videos that illustrate difficult topics 
through additional examples, written and explained by a select group of statistics 
educators. 


New Stepped Tutorials are centered on algorithmically generated quizzing with 
step-by-step feedback to help students work their way toward the correct solution. These 
new exercise tutorials (two to three per chapter) are easily assignable and assessable. 


Statistical Video Series consists of StatClips, StatClips Examples, and Statisti- 
cally Speaking “Snapshots.” View animated lecture videos, whiteboard lessons, and 
documentary-style footage that illustrate key statistical concepts and help students 
visualize statistics in real-world scenarios. 


New Video Technology Manuals available for TI-83/84 calculators, Minitab, 
Excel, JMP, SPSS, R, Remdr, and Crunchlt!® provide brief instructions for using 
specific statistical software. 


Updated StatTutor Tutorials offer multimedia tutorials that explore important 
concepts and procedures in a presentation that combines video, audio, and inter- 
active features. The newly revised format includes built-in, assignable assessments 
and a bright new interface. 


Updated Statistical Applets give students hands-on opportunities to familiarize 
themselves with important statistical concepts and procedures, in an interactive set- 
ting that allows them to manipulate variables and see the results graphically. Icons 
in the textbook indicate when an applet is available for the material being covered. 


Crunchlt!® is W. H. Freeman’s web-based statistical software that allows users to 
perform all the statistical operations and graphing needed for an introductory sta- 
tistics course and more. It saves users time by automatically loading data from BPS, 
and it provides the flexibility to edit and import additional data. 


Jmp JMP Student Edition (developed by SAS) is easy to learn and contains all 
the capabilities required for introductory statistics, including pre-loaded data sets from 
BPS. JMP is the leading commercial data analysis software of choice for scientists, engi- 
neers, and analysts at companies throughout the globe (for Windows and Mac). 


Stats@Work Simulations put students in the role of the statistical consultant, 
helping them better understand statistics interactively within the context of real- 
life scenarios. 


EESEE Case Studies (Electronic Encyclopedia of Statistical Examples and Exer- 
cises), developed by The Ohio State University Statistics Department, teach students 
to apply their statistical skills by exploring actual case studies using real data. 


Data files are available in Crunchlt!, JMP, ASCII, Excel, TI, Minitab, and SPSS 
(an IBM Company) * formats. 


Student Solutions Manual provides solutions to the odd-numbered exercises 
in the text. It is available electronically within LaunchPad, as well as in print form. 


Interactive Table Reader allows students to use statistical tables interactively 
to seek the information they need. 


Instructor’s Guide with Full Solutions includes teaching suggestions, chap- 
ter comments, and detailed solutions to all exercises. It is available electronically 
within LaunchPad. 


*SPSS was acquired by IBM in October 2009. 
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MEDIA AND SUPPLEMENTS 


Test Bank offers hundreds of multiple-choice questions. It is also available on 
CD-ROM (for Windows and Mac), where questions can be downloaded, edited, 
and resequenced to suit each instructor’s needs. 


Lecture PowerPoint Slides offer a customizable, detailed lecture presentation 
of statistical concepts covered in each chapter of BPS. 


Additional Resources Available with BPS 


Companion Website www.whfreeman.com/bps7e This open-access web- 
site includes statistical applets, data files, supplementary exercises, and self-quizzes. 
The website also offers companion chapters covering nonparametric tests, multiple 
regression, further topics in ANOVA, and statistics for quality control and capa- 
bility. Instructor access to the Companion Website requires user registration as an 
instructor and features all the open-access student web materials, plus: 


¢ Instructor’s Guide with Full Solutions 
° Test Bank 
¢ Lecture PowerPoint Slides containing all textbook figures and tables 


¢ Instructor version of EESEE with solutions to the exercises in the student 
version 


Special Software Packages Student versions of JMP and Minitab are avail- 
able for packaging with the text. JMP is available inside LaunchPad at no addi- 
tional cost. Contact your W. H. Freeman representative for information or visit 
www.whfreeman.com. 


Course Management Systems W. H. Freeman and Company provides 
courses for Blackboard, Angel, Desire2Learn, Canvas, Moodle, and Sakai 
course management systems. These are completely integrated solutions that in- 
structors can customize and adapt to meet teaching goals and course objectives. 
Visit macmillanhighered.com/Catalog/other/Coursepack for more information. 


bclicker i-clicker is a two-way radio-frequency classroom response solution 
developed by educators for educators. Each step of i-clicker’s development has 
been informed by teaching and learning. To learn more about packaging i-clicker 
with this textbook, please contact your local sales rep or visit www .iclicker.com. 
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hat’s hot in popular music this week? SoundScan knows. SoundScan col- 

lects data electronically from the cash registers in more than 14,000 retail 

outlets and also collects data on download sales from websites. When you 
buy a CD or download a digital track, the checkout scanner or website is prob- 
ably telling SoundScan what you bought. SoundScan provides this information 
to Billboard magazine, MTV, and VHI, as well as to record companies and artists’ 
agents. 

Should women take hormones such as estrogen after menopause, when natural 
production of these hormones ends? In 1992, several major medical organizations 
said “Yes.” In particular, women who took hormones seemed to reduce their risk 
of a heart attack by 35% to 50%. The risks of taking hormones appeared small 
compared with the benefits. But in 2002, the National Institutes of Health declared 
these findings wrong. Use of hormones after menopause immediately plummeted. 
Both recommendations were based on extensive studies. What happened? 

Is the climate warming? Is it becoming more extreme? An overwhelming major- 
ity of scientists now agree that the earth is undergoing major changes in climate. 
Enormous quantities of data are continuously being collected from weather stations, 
satellites, and other sources to monitor factors such as the surface temperature on 
land and sea, precipitation, solar activity, and the chemical composition of air and 
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water. Climate models incorporate this information to make projections of future 
climate change and can help us understand the effectiveness of proposed solutions. 

SoundScan, medical studies, and climate research all produce data (numerical 
facts), and lots of them. Using data effectively is a large and growing part of most 
professions, and reacting to data is part of everyday life. In fact, we define statistics 
as the science of learning from data. 

Although data are numbers, they are not “just numbers.” Data are numbers with 
a context. The number 8.5, for example, carries no information by itself. But if we 
hear that a friend’s new baby weighed 8.5 pounds at birth, we congratulate her on 
the healthy size of the child. The context engages our background knowledge and 
allows us to make judgments. We know that a baby weighing 8.5 pounds is a little 
above average, and that a human baby is unlikely to weigh 8.5 ounces or 8.5 kilo- 
grams (over 18 pounds). The context makes the number informative. 

To gain insight from data, we make graphs and do calculations. But graphs and 
calculations are guided by ways of thinking that amount to educated common 
sense. Let’s begin our study of statistics with an informal look at some aspects of 
statistical thinking.' 


Although, data can be collected in a variety of ways, the type of conclusion that 
can be reached from the data depends on how the data were obtained. Observational 
studies and experiments are two common methods for collecting data. Let’s take a 
closer look at the hormone replacement data to understand the differences. 


What’s behind the flip-flop in the advice offered to women about hormone replace- 
ment? The evidence in favor of hormone replacement came from a number of obser- 
vational studies that compared women who were taking hormones with others who 
were not. But women who choose to take hormones are very different from women 
who do not: they are richer and better educated and see doctors more often. These 
women do many things to maintain their health. It isn’t surprising that they have fewer 
heart attacks. 

Large and careful observational studies are expensive, but they are easier to 
arrange than careful experiments. Experiments don’t let women decide what to do. 
They assign women either to hormone replacement or to dummy pills that look and 
taste the same as the hormone pills. The assignment is done by a coin toss, so that 
all kinds of women are equally likely to get either treatment. Part of the difficulty of 
a good experiment is persuading women to accept the result—invisible to them—of 
the coin toss. By 2002, several experiments agreed that hormone replacement does 
not reduce the risk of heart attacks, at least for older women. Faced with this better 
evidence, medical authorities changed their recommendations.’ m 


Women who chose hormone replacement after menopause were on the average 
richer and better educated than those who didn’t. No wonder they had fewer heart 
attacks. We can’t conclude that hormone replacement reduces heart attacks just 
because we see this relationship in data. In this example, education and affluence 
are background factors that help explain the relationship between hormone replace- 
ment and good health. 

Children who play soccer do better in school (on the average) than children 
who don’t play soccer. Does this mean that playing soccer increases school grades? 
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Children who play soccer tend to have prosperous and well-educated parents. Once 
again, education and affluence are background factors that help explain the rela- 
tionship between soccer and good grades. 

Almost all relationships between two observed characteristics or “variables” are influ- 
enced by other variables lurking in the background. To understand the relationship 
between two variables, you must often look at other variables. Careful statistical 
studies try to think of and measure possible lurking variables in order to correct for 
their influence. As the hormone saga illustrates, this doesn’t always work well. News 
reports often just ignore possible lurking variables that might ruin a good headline 
like “Playing soccer can improve your grades.” The habit of asking, “What might lie 
behind this relationship?” is part of thinking statistically. 

Of course, observational studies are still quite useful. We can learn from 
observational studies how chimpanzees behave in the wild or which popular 
songs sold best last week or what percent of workers were unemployed last month. 
SoundScan’s data on popular music and the government’s data on employment 
and unemployment come from sample surveys, an important kind of observational 
study that chooses a part (the sample) to represent a larger whole. Opinion polls 
interview perhaps 1000 of the 235 million adults in the United States to report 
the public’s views on current issues. Can we trust the results? We'll see that this 
isn’t a simple yes-or-no question. Let’s just say that the government’s unemploy- 
ment rate is much more trustworthy than opinion poll results, and not just 
because the Bureau of Labor Statistics interviews 60,000 people rather than 1000. 
We can, however, say right away that some samples can’t be trusted. Consider the 
following write-in poll. 


The advice columnist Ann Landers once asked her readers, “If you had it to do over 
again, would you have children?” A few weeks later, her column was headlined “70% 
OF PARENTS SAY KIDS NOT WORTH IT.” Indeed, 70% of the nearly 10,000 parents 
who wrote in said they would not have children if they could make the choice again. 
Those 10,000 parents were upset enough with their children to write Ann Landers. 
Most parents are happy with their kids and don’t bother to write. m 


Statistically designed samples, even opinion polls, don’t let people choose them- 
selves for the sample. They interview people selected by impersonal chance so that 
everyone has an equal opportunity to be in the sample. Such a poll showed that 91% 
of parents would have children again. Where data come from matters a lot. If you are 
careless about how you get your data, you may announce 70% “no” when the truth 
is close to 90% “yes.” Understanding the importance of where data come and its 
relationship to the conclusions that can be reached is an important part of learning 
to think statistically. 


Yogi Berra, the Hall of Fame New York Yankee, said it: “You can observe a lot by 
just watching.” That’s a motto for learning from data. A few carefully chosen graphs 
are often more instructive than great piles of numbers. Consider the outcome of the 
2000 presidential election in Florida. 
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Elections don’t come much closer: after much recounting, state officials declared that 
George Bush had carried Florida by 537 votes out of almost 6 million votes cast. 
Florida’s vote decided the 2000 presidential election and made George Bush, rather 
than Al Gore, president. Let’s look at some data. Figure 0.1 displays a graph that plots 
votes for the third-party candidate Pat Buchanan against votes for the Democratic 
candidate Al Gore in Florida’s 67 counties. 
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FIGURE 0.1 


Votes in the 2000 presidential election 1 1 1 1 T 1 T T T 
for Al Gore and Patrick Buchanan in 0) 50,000 100,000 150,000 200,000 250,000 300,000 350,000 400,000 


Florida's 67 counties. What happened Votes for Gore 
in Palm Beach County? 


What happened in Palm Beach County? The question leaps out from the graph. In 
this large and heavily Democratic county, a conservative third-party candidate did far 
better relative to the Democratic candidate than in any other county. The points for 
the other 66 counties show votes for both candidates increasing together in a roughly 
straight-line pattern. Both counts go up as county population goes up. Based on 
this pattern, we would expect Buchanan to receive around 800 votes in Palm Beach 
County. He actually received more than 3400 votes. That difference determined the 
election result in Florida and in the nation. m 


The graph demands an explanation. It turns out that Palm Beach County used 
a confusing “butterfly” ballot (see photo on page 5), in which candidate names on 
both left and right pages led to a voting column in the center. It would be easy for a 
voter who intended to vote for Gore to in fact cast a vote for Buchanan. The graph 
is convincing evidence that this in fact happened. 

Most statistical software will draw a variety of graphs with a few simple com- 
mands. Examining your data with appropriate graphs and numerical summaries is 
the correct place to begin most data analyses. These can often reveal important 
patterns or trends that will help you understand what your data has to say. 


0.3 Variation is Everywhere 5 


Bruce Weaver/AFP/Getty Images 


The company’s sales reps file into their monthly meeting. The sales manager 
rises. “Congratulations! Our sales were up 2% last month, so we're all drinking 
champagne this morning. You remember that when sales were down 1% last month 
I fired half of our reps.” This picture is only slightly exaggerated. Many managers 
overreact to small short-term variations in key figures. Here is Arthur Nielsen, for- 
mer head of the country’s largest market research firm, describing his experience: 


Too many business people assign equal validity to all numbers printed on 
paper. They accept numbers as representing Truth and find it difficult 
to work with the concept of probability. They do not see a number as a 
kind of shorthand for a range that describes our actual knowledge of the 
underlying condition.’ 


Business data such as sales and prices vary from month to month for reasons 
ranging from the weather to a customer's financial difficulties to the inevitable 
errors in gathering the data. The manager’s challenge is to say when there is a real 
pattern behind the variation. We’ll see that statistics provides tools for understand- 
ing variation and for seeking patterns behind the screen of variation. Let’s look at 
some more data. 


Figure 0.2 plots the average price of a gallon of regular unleaded gasoline each week 
from September 1990 to June 2013.* There certainly is variation! But a close look 
shows a yearly pattern: gas prices go up during the summer driving season, then 
down as demand drops in the fall. On top of this regular pattern, we see the effects 
of international events. For example, prices rose when the 1990 Gulf War threatened 
oil supplies and dropped when the world economy turned down after the September 
11, 2001, terrorist attacks in the United States. The years 2007 and 2008 brought 
the perfect storm: the ability to produce oil and refine gasoline was overwhelmed 
by high demand from China and the United States and continued turmoil in the oil- 
producing areas of the Middle East and Nigeria. Add in a rapid fall in the value of 
the dollar, and prices at the pump skyrocketed to more than $4 per gallon. In 2010 
the Gulf oil spill also affected supply and hence prices. The data carry an important 
message: because the United States imports much of its oil, we can’t control the price 
we pay for gasoline. m 
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FIGURE 0.2 

Variation is everywhere: the average 
retail price of regular unleaded gasoline, 
1990 to mid 2013. 
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Variation is everywhere. Individuals vary; repeated measurements on the same indi- 


vidual vary; almost everything varies over time. One reason we need to know some 
statistics is that it helps us deal with variation and to describe the uncertainty in 
our conclusions. Let’s look at another example to see how variation is incorporated 
into our conclusions. 


Cervical cancer, once the leading cause of cancer deaths among women, is the 
easiest female cancer to prevent with regular screening tests and follow-up. Almost 
all cervical cancers are caused by human papillomavirus (HPV). The first vaccine to 
protect against the most common varieties of HPV became available in 2006. The 
Centers for Disease Control and Prevention recommend that all girls be vaccinated at 
age 11 or 12. In 2011, the CDC made the same recommendation for boys, to protect 
against anal and throat cancers caused by the HPV virus. 

How well does the vaccine work? Doctors rely on experiments (called “clinical 
trials” in medicine) that give some women the new vaccine and others a dummy vac- 
cine. (This is ethical when it is not yet known whether or not the vaccine is safe and 
effective.) The conclusion of the most important trial was that an estimated 98% of 
women up to age 26 who are vaccinated before they are infected with HPV will avoid 
cervical cancers over a three-year period. 

Women who get the vaccine are much less likely to get cervical cancer. But 
because variation is everywhere, the results are different for different women. Some 
vaccinated women will get cancer, and many who are not vaccinated will escape. 
Statistical conclusions are “on the average” statements only, and even these “on the 
average” statements have an element of uncertainty. Although we can't be 100% 
certain that the vaccine reduces risk on the average, statistics allows us to state how 
confident we are that this is the case. m 
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Because variation is everywhere, conclusions are uncertain. Statistics gives us a lan- 
guage for talking about uncertainty that is used and understood by statistically literate 
people everywhere. In the case of HPV vaccine, the medical journal used that lan- 
guage to tell us: “Vaccine efficiency .. . was 98% (95 percent confidence interval 
86% to 100%).”> That “98% effective” is, in Arthur Nielsen’s words, “shorthand 
for a range that describes our actual knowledge of the underlying condition.” 
The range is 86% to 100%, and we are 95 percent confident that the truth lies 
in that range. We will soon learn to understand this language. We can’t escape 
variation and uncertainty. Learning statistics enables us to live more comfortably 
with these realities. 


The purpose of Basic Practice of Statistics is to give you a working knowledge 
of the ideas and tools of practical statistics. We will divide practical statistics into 
three main areas. 


« Data analysis concerns methods and strategies for looking at data; exploring, 
organizing, and describing data using graphs and numerical summaries. Your 
thoughtful exploration allows data to illuminate reality. Part I of this book 
(Chapters 1 to 6) discusses data analysis. 


«= Data production provides methods for producing data that can give clear 
answers to specific questions. Where data come from matters and is often the 
most important limitation on their usefulness. Basic concepts about how to 
select samples and design experiments are some of the most influential ideas 
in statistics. These concepts are the subject of Chapters 8 and 9. 


» Statistical inference moves beyond the data in hand to draw conclusions 
about some wider universe. Statistical conclusions aren’t yes-or-no answers— 
they must take into account that variation is everywhere; variability among 
people, animals, or objects and uncertainty in data. To describe variation and 
uncertainty, inference uses the language of probability, introduced in Chapter 12. 
Because we are concerned with practice rather than theory, we need only a 
limited knowledge of probability. Chapters 13 and 14 offer more probability 
for those who want it. Chapters 15 to 18 discuss the reasoning of statistical 
inference. These chapters are the key to the rest of the book. Chapters 20 
to 23 present inference as used in practice in the most common settings. 
Chapters 25 to 27 concern more advanced or specialized kinds of inference. 


Because data are numbers with a context, doing statistics means more than manipulat- 
ing numbers. You must state a problem in its real-world context, plan your specific 
statistical work in detail, solve the problem by making the necessary graphs and 
calculations, and conclude by explaining what your findings say about the real- 
world setting. We’ll make regular use of this four-step process to encourage good 
habits that go beyond graphs and calculations to ask, “What do the data tell me?” 

Statistics does involve lots of calculating and graphing. The text presents the 
techniques you need, but you should use technology to automate calculations and 
graphs as much as possible. Because the big ideas of statistics don’t depend on any 
particular level of access to technology, Basic Practice of Statistics does not require 
software or a graphing calculator until we reach the more advanced methods in 
Part V of the text. Even if you make little use of technology, you should look at 
the “Using Technology” sections throughout the book. You will see at once that 
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you can read and apply the output from almost any technology used for statistical 
calculations. The ideas really are more important than the details of how to do the 
calculations. 

Unless you have access to software or a graphing calculator, you will need a 
basic calculator with some built-in statistical functions. Specifically, your calcula- 
tor should find means and standard deviations and calculate correlations and 
regression lines. Look for a calculator that claims to do “two-variable statistics” 
or mentions “regression.” 

Although ability to carry out statistical procedures is very useful in academics 
and employment, the most important asset you can gain from the study of statis- 
tics is an understanding of the big ideas about working with data. Basic Practice of 
Statistics tries to explain the most important ideas of statistics, not just teach meth- 
ods. Some examples of big ideas that you will meet (one from each of the three areas 
of statistics) are “always plot your data,” “randomized comparative experiments,” 
and “statistical significance.” 

You learn statistics by doing statistical problems. As you read, you will see several 
levels of exercises, arranged to help you learn. Short “Apply Your Knowledge” prob- 
lem sets appear after each major idea. These are straightforward exercises that help 
you solidify the main points as you read. Be sure you can do these exercises before 
going on. The end-of-chapter exercises begin with multiple-choice “Check Your 
Skills” exercises (with all answers in the back of the book). Use them to check your 
grasp of the basics. The regular “Chapter Exercises” help you combine all the ideas 
of a chapter. Finally, the four Part Review chapters (Chapters 7, 11, 19, and 24) 
look back over major blocks of learning, with many review exercises. At each step 
you are given less advance knowledge of exactly what statistical ideas and skills the 
problems will require, so each type of exercise requires more understanding. 

The key to learning is persistence. The main ideas of statistics, like the main ideas 
of any important subject, took a long time to discover and take some time to master. 
The gain will be worth the pain. 
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0.2 


Observational studies and experiments. Observa- 
tional studies have suggested that vitamin E reduces 
the risk of heart disease. Careful experiments, how- 
ever, showed that vitamin E has no effect. According 
to a commentary in the Journal of the American Medi- 
cal Association: 


Thus, vitamin E enters the category of therapies that were 
promising in epidemiologic and observational studies but 
failed to deliver in adequately powered randomized con- 
trolled trials. As in other studies, the “healthy user” bias must 
be considered; i.e., the healthy lifestyle behaviors that charac- 
terize individuals who care enough about their health to take 
various supplements are actually responsible for the better 
health, but this is minimized with the rigorous trial design.° 


(a) Reread Example 0.1 and the comments following it. 
Explain why observational studies suggest that Vitamin 
E therapy reduces the risk of heart disease by describing 
some lurking variables. 

(b) A randomized controlled trial is a type of experiment. 
How does “healthy user bias” explain how people who 
take vitamin E supplements have better health in 
observational studies but not in experiments? 


The price of gas. In Example 0.4 we examined the 
variation in the price of gasoline from 1990 to 2013. 
We saw both a regular pattern and the effects of 
international events. Figure 0.3 plots the average 
annual retail price of gasoline from 1929 to 1990.’ 
Prices are adjusted for inflation. What overall patterns 
do you observe? What departures from the overall 
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patterns do you observe? To what international events 
do these departures correspond? 


Online polls. Ed Schultz is a liberal political commen- 
tator and host of The Ed Show. After an often lengthy 
and impassioned monologue from Mr. Schultz, view- 
ers are asked to text their replies to a poll question. 
On June 23, 2013, after a monologue that included 
various issues on which the Republicans could not be 
trusted, Ed asked viewers to text in their responses 
to the question, “Do Republicans care about the per- 
sonal struggles of undocumented immigrants?” Ap- 
proximately 96.4% of those responding either by text 
or online said “no.” 


(a) This poll has some of the same problems as Ann 
Landers’ poll of Example 0.2. Do you think that the 
proportion of Americans who feel this way is higher, 
lower, or close to 96.4%? Explain. 

(b) For this poll, 868 people responded. Among those 
responding, 837 or 96.4% said “no.” Do you think 
the results would have been more trustworthy if 2500 
people had responded instead of 868? Explain. 


Traffic fatalities and 9/11. Figure 0.4 provides informa- 
tion on the number of fatal traffic accidents by month 
for the years 1996-2001.° The vertical blue line above 
each month gives the lowest to highest number of fa- 
tal crashes for that month for the years 1996-2000. For 
example, in January the number of fatal crashes for the 
five years from 1996 through 2000 was between about 
2600 and 2900. The blue dots give the number of fatal 
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FIGURE 0.3 


Year 


The average annual retail price of gasoline, 1929 to 1990. Prices are adjusted for inflation. 
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Number of fatal traffic accidents in the United States in 1996 through 2000 versus 2001. For 
each month, the blue lines represent the range of the number of fatal accidents from 1996 
through 2000, and the blue dot gives the number of fatal accidents in 2001. 


crashes for each month in 2001. The numbers of fatal 
crashes from January through August of 2001 follow the 
general pattern for the five preceding years as we see the 
blue dots are well within the blue lines for each month. 


(a) What happened in the last three months of 2001? 

The numbers of fatal crashes in October through Decem- 
ber of 2001 are consistently at or above the values for the 
previous five years. How can you tell this from the graph? 


(b) On September 11, 2001, terrorists hijacked four 
US. airplanes and used them to strike various targets 
on the East Coast. In part (a), we saw from the graph 
that fatal traffic accidents seemed to be unusually 
high in the three months following the attacks. Did 
the terrorists cause fatal crashes to increase? Can you 
give a simple explanation for the apparent increase in 
fatal crashes during these months? 


Moments by Mullineux/Shutterstock 


Exploring Data 


6 6 hat do the data say?” is the first question we ask in any sta- 
tistical study. Data analysis answers this question by open- 
ended exploration of the data. The tools of data analysis are 
graphs such as histograms and scatterplots, and numerical 

measures such as means and correlations. At least as important as the tools are 

principles that organize our thinking as we examine data. The seven chapters 
in Part | present the principles and tools of statistical data analysis. They equip 
you with skills that are immediately useful whenever you deal with numbers. 

These chapters reflect the strong emphasis on exploring data that charac- 
terizes modern statistics. Sometimes we hope to draw conclusions that apply 
to a setting that goes beyond the data in hand. This is statistical inference, the 
topic of much of the rest of the book. Data analysis is essential if we are to trust 
the results of inference, but data analysis isn't just preparation for inference. 
Roughly speaking, you can always do data analysis but inference requires rather 
special conditions. 

One of the organizing principles of data analysis is to first look at one thing 
at a time and then at relationships. Our presentation follows this principle. In 
Chapters 1, 2, and 3 you will study variables and their distributions. Chapters 4, 
5, and 6 concern relationships among variables. Chapter 7 reviews this part of 
the text. 


PAR! 


EXPLORING DATA: 
Variables and Distributions 


CHAPTER 1 Picturing 
Distributions with Graphs 


CHAPTER 2 Describing 
Distributions with Numbers 


CHAPTER 3. The Normal 
Distributions 


EXPLORING DATA: 
Relationships 


CHAPTER 4 Scatterplots 
and Correlation 


CHAPTER 5 Regression 


CHAPTER 6 _ [wo-Way 
Tables* 


CHAPTER 7 Exploring 
Data: Part | Review 
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tatistics is the science of data. The volume of data available to us is overwhelm- 

ing. For example, the U.S. Census Bureau’s American Community Survey col- 

lects data from about 3,000,000 housing units each year. Astronomers work 
with data on tens of millions of galaxies. The checkout scanners at Walmart’s 8000 
stores in 15 countries record hundreds of millions of transactions every week, all 
saved to inform both Walmart and its suppliers. The first step in dealing with such 
a flood of data is to organize our thinking about data. Fortunately, we can do this 
without looking at millions of data points. 


1.1 Individuals and variables 


Any set of data contains information about some group of individuals. The infor- 
mation is organized in variables. 


Individuals and variables 


Categorical variables: pie 
charts and bar graphs 


Quantitative variables: 
histograms 


Interpreting histograms 


Quantitative variables: 
stemplots 


Time plots 
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TQ pata 

Md The documentary Particle 
Fever recreates the excitement of 
the Large Hadron Collider (LHC) 
experiment. The LHC is a 17 mile 
tunnel, designed to accelerate a 
proton to close to the speed of light, 
and then have protons collide to 
help physicists understand how 
the universe works. When the first 
collisions are recorded live in the 
film, American physicist Monica 
Dunford exclaims, “We have data. 
It’s unbelievable how fantastic this 
data is.” 


Individuals and Variables 


Individuals are the objects described by a set of data. Individuals may be people, but 
they may also be animals or things. 

A variable is any characteristic of an individual. A variable can take different values 
for different individuals. 


A college’s student database, for example, includes data about every currently 
enrolled student. The students are the individuals described by the data set. For 
each individual, the data contain the values of variables such as date of birth, choice 
of major, and grade point average (GPA). In practice, any set of data is accompanied 
by background information that helps us understand the data. When you plan a 
statistical study or explore data from someone else’s work, ask yourself the following 
questions: 


1. Who? What individuals do the data describe? How many individuals appear in 
the data? 


2. What? How many variables do the data contain? What are the exact defini- 
tions of these variables? In what unit of measurement is each variable re- 
corded? Weights, for example, might be recorded in pounds, in thousands of 
pounds, or in kilograms. 


3. Where? Student GPAs and SAT scores (or lack of them) will vary from college 
to college depending on many variables, including admissions “selectivity” for 
the college. 


When? Students change from year to year, as do prices, salaries, and so forth. 


5. Why? What purpose do the data have? Do we hope to answer some specific 
questions? Do we want answers for just these individuals or for some larger 
group that these individuals are supposed to represent? Are the individuals 
and variables suitable for the intended purpose? 


Some variables, such as a person’s sex or college major, simply place individuals 
into categories. Others, like height and GPA, take numerical values for which we 
can do arithmetic. It makes sense to give an average income for a company’s employ- 
ees, but it does not make sense to give an “average” sex. We can, however, count the 
numbers of female and male employees and do arithmetic with these counts. 


Categorical and Quantitative Variables 


A categorical variable places an individual into one of several groups or categories. 
A quantitative variable takes numerical values for which arithmetic operations 
such as adding and averaging make sense. The values of a quantitative variable are 
usually recorded with a unit of measurement such as seconds or kilograms. 


At the U.S. Census Bureau website, you can view the detailed data collected by the 
American Community Survey, though of course the identities of people and housing 
units are protected. If you choose the file of data on people, the individuals are the 
people living in the housing units contacted by the survey. Over 100 variables are 
recorded for each individual. Figure 1.1 displays a very small part of the data. 
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Each row records data on one individual. Each column contains the values of one 
variable for all the individuals. Translated from the U.S. Census Bureau’s abbreviations, 
the variables are 


SERIALNO _ An identifying number for the household. 


PWGTP Weight in pounds. 

AGEP Age in years. 

JWMNP Travel time to work in minutes. 

SCHL Highest level of education. The numbers designate categories, 


not specific grades. For example, 9 = high school 
graduate, 10 = some college but no degree, 
and 13 = bachelor’s degree. 
SEX Sex, designated by 1 = male and 2 = female. 
WAGP Wage and salary income last year, in dollars. 


Look at the highlighted row in Figure 1.1. This individual is a 53-year-old man 
who weighs 234 pounds, travels 10 minutes to work, has a bachelor’s degree, and 
earned $83,000 last year. 

In addition to the household serial number, there are six variables. Education and 
sex are categorical variables. The values for education and sex are stored as numbers, 
but these numbers are just labels for the categories and have no units of measure- 
ment. The other four variables are quantitative. Their values do have units. These vari- 
ables are weight in pounds, age in years, travel time in minutes, and income in dollars. 

The purpose of the American Community Survey is to collect data that represent 
the entire nation to guide government policy and business decisions. To do this, the 
households contacted are chosen at random from all households in the country. We 
will see in Chapter 8 why choosing at random is a good idea. m 


Most data tables follow this format—each row is an individual, and each column 
is a variable. The data set in Figure 1.1 appears in a spreadsheet program that has _—_ spreadsheet 
rows and columns ready for your use. Spreadsheets are commonly used to enter and 
transmit data and to do simple calculations. 
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() LaunchPad Online Resources 


* The Snapshots video, Data and Distributions, provides a nice introduction to 
the ideas of this section. 


* The StatClips Examples video, Basic Principles of Exploring Data Example 
A, describes the variables collected in the American Community Survey from 
Example 1.1. 


Apply Your Knowledge 


1.1. Fuel Economy. Here is a small part of a data set that describes the fuel 
economy (in miles per gallon (mpg)) of model year 2014 motor vehicles: 


Make and Vehicle Transmission Number of City Highway Annual 
Model Class Type Cylinders mpg mpg Fuel Cost 
Chevrolet Corvette Two-seater Manual 8 17 29 $2,650 
Nissan Cube Small station wagon Manual 4 25 30 $1,850 
Ford Fusion Midsize Automatic 4 23 36 $1,800 
Chevrolet Impala Large Automatic 6 18 28 $2,400 


The annual fuel cost is an estimate assuming 15,000 miles of travel a year (55% 
city and 45% highway) and an average fuel price. 


(a) What are the individuals in this data set? 
(b) For each individual, what variables are given? Which of these variables 


are categorical, and which are quantitative? In what units are the 
quantitative variables measured? 


1.2 Students and Exercise. You are preparing to study the exercise habits of 
college students. Describe two categorical variables and two quantitative vari- 
ables that you might measure for each student. Give the units of measurement 
for the quantitative variables. 


Statistical tools and ideas help us examine data in order to describe their main 
exploratory data analysis _—_ features. This examination is called exploratory data analysis. Like an explorer 
crossing unknown lands, we want first to simply describe what we see. Here are two 

principles that help us organize our exploration of a set of data. 


Exploring Data 


1. Begin by examining each variable by itself. Then move on to study the relation- 
ships among the variables. 


2. Begin with a graph or graphs. Then add numerical summaries of specific aspects 
of the data. 


1.2 Categorical Variables: Pie Charts and Bar Graphs 


We will follow these principles in organizing our learning. Chapters 1 to 3 present 
methods for describing a single variable. We study relationships among several vari- 
ables in Chapters 4 to 6. In each case, we begin with graphical displays, then add 
numerical summaries for more complete description. 

The proper choice of graph depends on the nature of the variable. To examine a 
single variable, we usually want to display its distribution. 


Distribution of a Variable 


The distribution of a variable tells us what values it takes and how often it takes 
these values. 


The values of a categorical variable are labels for the categories. The distribution of 
a categorical variable lists the categories and gives either the count or the percent 
of individuals who fall into each category. 


Approximately 1.5 million full-time, first-year students enrolled in colleges and uni- 
versities in 2013. What do they plan to study? Here are data on the percents of 
first-year students who plan to major in several discipline areas:' 


Field of Study Percent of Students 
Arts and humanities 10.6 
Biological sciences 14.7 
Business 14.5 
Education 5) 
Engineering 11.2 
Health professions 12.8 
Math and computer science 3.7 
Physical sciences 2.4 
Social sciences 10.1 
Other majors and undeclared 14.9 
Total 100.1 


It's a good idea to check data for consistency. The percents should add to 100%. In 
fact, they add to 100.1%. What happened? Each percent is rounded to the nearest 
tenth. The exact percents would add to 100, but the rounded percents only come 
close. This is roundoff error. Roundoff errors don’t point to mistakes in our work, 
just to the effect of rounding off results. m 


Columns of numbers take time to read. You can use a pie chart or a bar graph 
to display the distribution of a categorical variable more vividly. Figures 1.2 and 1.3 
illustrate these displays for the distribution of intended college majors. 
Pie charts show the distribution of a categorical variable as a “pie” whose slices 
are sized by the counts or percents for the categories. Pie charts are awkward to 
make by hand, but software will do the job for you. A pie chart must include 
A all the categories that make up a whole. Use a pie chart only when you want 
to emphasize each category's relation to the whole. We need the “Other majors 
and undeclared” category in Example 1.2 to complete the whole (all intended 
majors) and allow us to make the pie chart in Figure 1.2. 


MAJORS 


roundoff error 


pie charts 
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FIGURE 1.2 Other 
You can use a pie chart to display the 
distribution of a categorical variable. 


Arts/humanities 


Biological sciences 


This pie chart shows the distribution of Social sciences 

intended majors of students entering 

college. Physical sciences |__——S 

Math/Comp Business 
eee This slice occupies 14.5% of the 
cere Education | pie because 14.5% of students 
Engineering plan to major in business. 
bar graphs Bar graphs represent each category as a bar. The bar heights show the category 


counts or percents. Bar graphs are easier to make than pie charts and also easier 
to read. Figure 1.3 displays two bar graphs of the data on intended majors. The 
first orders the bars alphabetically by field of study (with “Other” at the end). It is 
often better to arrange the bars in order of height, as in Figure 1.3(b). This helps us 
immediately see which majors appear most often. 

Bar graphs are more flexible than pie charts. Both graphs can display the distri- 
bution of a categorical variable, but a bar graph can also compare any set of quanti- 
ties that are measured in the same units. 


This bar has height 12.8% 
because 12.8% of students 
plan to major in the health 
professions. 
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FIGURE 1.3 
Bar graphs of the distribution of intended majors of students entering college. In part (a), the bars follow the 
alphabetical order of fields of study. In part (b), the same bars appear in order of height. 


a What sources do Americans aged 12-24 years use to keep up-to-date and learn 
at it about music? Among those saying it was important to keep up with music, Arbitron 
asked which of several sources they had ever used. Here are the percents who have 
used each source.? 
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Percent of 12—24s Who 


Source Have Used Each Source 
“AMFMradio = i ssti(i‘ié;O FO! 

Friends/family 79 

YouTube 77 

Music television channels 53) 

Facebook 57 

Pandora 58 

Apple iTunes 47 

Information at local stores 29 

SiriusXM Satellite Radio 12 

Music blogs 20 

iHeartRadio 22 

Spotify 13 


We can’t make a pie chart to display these data. Each percent in the table refers to a 
different source, not to parts of a single whole. Figure 1.4 is a bar graph comparing 
the 12 sources. We have again arranged the bars in order of height. = 
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each source to learn about new music, 


Source of new music information for Example 1.3. 


Bar graphs and pie charts are mainly tools for presenting data: they help your 
audience grasp data quickly. Because it is easy to understand data on a single cate- 
gorical variable without a graph, bar graphs and pie charts are of limited use for data 
analysis. We will move on to quantitative variables, where graphs are essential tools. 


() LaunchPad Online Resources 


* The Snapshots video, Visualizing and Summarizing Categorical Data, 
illustrates the ideas of both pie charts and bar graphs. 


* The StatClips Examples video, Summaries and Pictures for Categorical Data 
Example B, provides the details of constructing pie charts and bar graphs 
through an example. 
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Apply Your Knowledge 


1.3. Do You Listen to Country Radio? The rating service Arbitron places 
US. radio stations into more than 50 categories that describe the kinds of 
programs they broadcast. Which formats attract the largest audiences? Here 
are Arbitron’s measurements of the share of the listening audience (aged 12 
and over) at a given time for the most popular formats:* £y),) RADIO 


Format Audience Share 
Country 14.2% 
News/talk/information 11.4% 
Pop contemporary hit 8.2% 
Adult contemporary 8.1% 
Classic hits 5.2% 
Classic rock 5.2% 
Hot adult contemporary 4.7% 
Urban adult contemporary 4.1% 
Rhythmic contemporary hit 3.4% 
All sports 3.1% 
Urban contemporary 3.0% 
Contemporary Christian 2.9% 
Mexican regional 2.9% 


(a) What is the sum of the audience shares for these formats? What per- 
cent of the radio audience listens to stations with other formats? 

(b) Make a bar graph to display these data. Be sure to include an “Other 
format” category. 


(c) Would it be correct to display these data in a pie chart? Why or why not? 


1.4 How Do Students Pay for College? The Higher Education Research 
Institute’s Freshman Survey includes over 200,000 first-time, full-time fresh- 
men who entered college in 2013.4 The survey reports the following data on 
the sources students use to pay for college expenses. EXPENSE 


Source for College Expenses Students 
Family resources 77.8% 
Student resources 62.3% 
Aid—not to be repaid 72.9% 
Aid—to be repaid 50.8% 


(a) Explain why it is not correct to use a pie chart to display these data. 


(b) Make a bar graph of the data. Notice that because the data contrast 
groups such as family and student resources, it is better to keep these bars 
next to each other rather than to arrange the bars in order of height. 


1.5 Never on Sunday? Births are not, as you might think, evenly distributed 
across the days of the week. Here are the average numbers of babies born on 
each day of the week in 2012:5 El) BIRTHS 
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Day Births 
Sunday 7,134 
Monday 11,430 
Tuesday 12,387 
Wednesday 12,230 
Thursday 12,308 
Friday 12,047 
Saturday 8,124 


Present these data in a well-labeled bar graph. Would it also be correct to make a 
pie chart? Suggest some possible reasons why there are fewer births on weekends. 


Quantitative variables often take many values. The distribution tells us what 
values the variable takes and how often it takes these values. A graph of the distri- 
bution is clearer if nearby values are grouped together. The most common graph of 
the distribution of one quantitative variable is a histogram. 


histogram 


What percent of your home state’s high school students graduate within four years? 
The No Child Left Behind Act of 2001 uses on-time high school graduation rates as 
one of its monitoring requirements. However, in 2001 most states were not collecting 
the necessary data to compute these rates accurately. The Freshman Graduation Rate 
(FGR) counts the number of high school graduates in a given year for a state and 
divides this by the number of ninth graders enrolled four years previously. Although 
the FGR can be computed from readily available data, it neglects high school students 
moving into and out of a state and may include students who have repeated a grade. 
Several alternative measures are available that partially correct for these deficiencies, 
but states have been free to choose their own measure, and the resulting rates can 
differ by more than 10%. Federal law now requires all states to use a common, more 
rigorous computation, the Adjusted Cohort Graduation Rate, that tracks individual 
students. Table 1.1 presents the data for 2010-11, the first year in which states 
used a common formula and for which graduation rates could be compared between 
states.° Idaho, Kentucky, and Oklahoma received “timeline extensions” and were not 
required to file in 2010-11. 

The individuals in this data set are the states. The variable is the percent of a state’s 
high school students who graduate within four years. The states vary quite a bit on this 
variable, from 59% in the District of Columbia to 88% in lowa. It’s much easier to see 
how your state compares with other states from a graph like a histogram than from 
the table. To make a histogram of the distribution of this variable, proceed as follows: 


Step 1. Choose the classes. Divide the range of the data into classes of equal 
width. The data in Table 1.1 range from 59 to 88, so we decide to use these classes: 


percent on-time graduates between 55 and 60 (55 to <60) 
percent on-time graduates between 60 and 65 (60 to <65) 


percent on-time graduates between 85 and 90 (85 to <90) 


ny WHAT'S THAT 
utd NUMBER? 


You might think that numbers, 
unlike words, are universal. Think 
again. A “billion” in the United 
States means 1,000,000,000 (nine 
zeros). In Europe, a “billion” is 
1,000,000,000,000 (12 zeros). 

OK, those are words that describe 
numbers. But those commas in big 
numbers are periods in many other 
languages. This is so confusing that 
international standards call for 
spaces instead, so that an American 
billion is written 1 000 000 000. And 
the decimal point of the English- 
speaking world is the decimal comma 
in many other languages, so that 
3.1416 in the United States becomes 
3,1416 in Europe. So what is the 
number 10,642.389? It depends on 
where you are. 
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STATE PERCENT REGION | STATE PERCENT REGION | STATE PERCENT REGION 
Alabama 72 S Louisiana 71 S Ohio 80 MW 
Alaska 68 W Maine 84 NE Oklahoma = S 
Arizona 78 W Maryland 83 S Oregon 68 W 
Arkansas 81 S Massachusetts 83 NE Pennsylvania 83 NE 
California 76 W Michigan 74 MW Rhode Island 77 NE 
Colorado 74 W Minnesota 77 MW South Carolina 74 5 
Connecticut 83 NE Mississippi 75 S South Dakota 83 MW 
Delaware 78 S Missouri 81 MW Tennessee 86 S 
Florida 71 S Montana 82 W Texas 86 S 
Georgia 67 § Nebraska 86 MW Utah 76 W 
Hawaii 80 W Nevada 62 W Vermont 87 NE 
Idaho — W New Hampshire 86 NE Virginia 82 S 
Illinois 84 MW New Jersey 83 NE Washington 76 W 
Indiana 86 MW New Mexico 63 W West Virginia 76 S 
Iowa 88 MW New York 77 NE Wisconsin 87 MW 
Kansas 83 MW North Carolina 78 § Wyoming 80 W 
Kentucky — S North Dakota 86 MW Dist. of Columbia 59 S 


It is important to specify the classes carefully so that each individual falls into exactly 
one class. Our notation 55 to <60 indicates that the first class includes states with 
graduation rates starting at 55% and up to, but not including, graduation rates of 
60%. Thus, a state with an on-time graduation rate of 60% falls into the second 
class, whereas a state with an on-time graduation rate of 59% falls into the first class. 
It is equally correct to use classes 56 to <62, 62 to <68, and so forth. Just be sure 
to specify the classes precisely so that each individual falls into exactly one class. 


Step 2. Count the individuals in each class. Here are the counts: 


Class Count Class Count 
55 to <60 1 75 to <80 11 
60 to <65 2 80 to <85 16 
65 to <70 3 85 to <90 9 
70 to <75 6 


Check that the counts add to 48, the number of individuals in the data set (the 47 
states reporting and the District of Columbia). 


Step 3. Draw the histogram. Mark the scale for the variable whose distribution 
you are displaying on the horizontal axis. That’s the percent of a state’s high school 
students who graduate within four years. The scale runs from 55 to 90 because that 
is the span of the classes we chose. The vertical axis contains the scale of counts. 
Each bar represents a class. The base of the bar covers the class, and the bar height 
is the class count. Draw the bars with no horizontal space between them unless a 
class is empty, so that its bar has height zero. Figure 1.5 is our histogram. Remember, 
an observation on the boundary of the bars—say, 65—is counted in the bar to its 
right. m 
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FIGURE 1.5 
4 Histogram of the distribution of the per- 
cent of on-time high school graduates in 
47 states and the District of Columbia, 
for Example 1.4. 


This bar has height 11 
because 11 states have 
between 75% and 79% 
on-time HS graduation 
rates. 


Number of states 


5) 60 65 70 U5 80 85 90 
Percent of on-time HS graduates 


histogram displays the distribution of a quantitative variable. The horizon- 

tal axis of a histogram is marked in the units of measurement for the vari- 
able. A bar graph compares the sizes of different quantities. The horizontal axis of 
a bar graph simply identifies the quantities being compared and need not have any 
measurement scale. These quantities may be the values of a categorical variable, but 
they may also be unrelated, like the sources used to learn about music in Example 1.3. 
Draw bar graphs with blank space between the bars to separate the quantities 
being compared. Draw histograms with no space, to indicate that all values of the 
variable are covered. A gap between bars in a histogram indicates that there are 
no values for that class. 

Our eyes respond to the area of the bars in a histogram.’ Because the classes are 
all the same width, area is determined by height, and all classes are fairly repre- 
sented. There is no one right choice of the classes in a histogram. Too few classes 
will give a “skyscraper” graph, with all values in a few classes with tall bars. Too 
many will produce a “pancake” graph, with most classes having one or no observa- 
tions. Neither choice will give a good picture of the shape of the distribution. You 
must use your judgment in choosing classes to display the shape. Statistics software 
will choose the classes for you. The software's choice is usually a good one, but you 
can change it if you want. The histogram function in the One-Variable Statistical 


Calculator applet on the text website allows you to change the number of classes by : 


A Although histograms resemble bar graphs, their details and uses are different. A 


dragging with the mouse, so that it is easy to see how the choice of classes affects 
the histogram. 


Apply Your Knowledge 


1.6 Foreign Born. How are foreign-born residents distributed in the United 
States? The country as a whole has 13.0% foreign-born residents, but the states 
vary from 1.3% in West Virginia to 27.1% in California. Table 1.2 presents 
the data for all 50 states and the District of Columbia.® Make a histogram of 


24, CHAPTER 1 @ Picturing Distributions with Graphs 


STATE PERCENT STATE PERCENT STATE PERCENT 
Alabama 3.4 Louisiana 3.8 Ohio 3.9 
Alaska 6.2 Maine 3.3 Oklahoma ae) 
Arizona 13.4 Maryland 13.7 Oregon 9.5 
Arkansas 4.3 Massachusetts 14.9 Pennsylvania 5.9 
California 27.1 Michigan 6.1 Rhode Island 13.5 
Colorado 9.7 Minnesota 7.4 South Carolina 4.7 
Connecticut 13.3 Mississippi 2.3 South Dakota 2.9 
Delaware 8.6 Missouri 4.1 Tennessee 4.7 
Florida 19.4 Montana 2.0 Texas 16.5 
Georgia 9.6 Nebraska 6.2 Utah 8.4 
Hawaii 18.2 Nevada 19.2 Vermont 3.9 
Idaho 5.9 New Hampshire By) Virginia Hiloil 
Illinois 13.9 New Jersey 21.3 Washington 13.4 
Indiana 4.6 New Mexico 10.2 West Virginia 1.3 
Iowa 4.3 New York 22.2 Wisconsin 4.8 
Kansas 6.7 North Carolina tes) Wyoming 2.9 
Kentucky 3.3 North Dakota 2.4 Dist. of Columbia 13.6 


© 


the percents using classes of width 5% starting at 0.0%. That is, the first bar 
covers 0.0% to <5.0%, the second covers 5.0% to <10.0%, and so on. (Make 
this histogram by hand, even if you have software, to be sure you understand 
the process. You may then want to compare your histogram with your soft- 
ware’s choice.) FOREIGN 


1.7. Choosing Classes in a Histogram. ‘The data set menu that accompa- 
nies the One-Variable Statistical Calculator applet includes the data on for- 
eign-born residents in the states from Table 1.2. Choose these data, then 
click on the “Histogram” tab to see a histogram. 


(a) How many classes does the applet choose to use? (You can click on the 
graph outside the bars to get a count of classes.) 

(b) Click on the graph and drag to the left. What is the smallest number of 
classes you can get? What are the lower and upper bounds of each class? 
(Click on the bar to find out.) Make a rough sketch of this histogram. 

(c) Click and drag to the right. What is the greatest number of classes you 
can get? How many observations does the largest class have? 

(d) You see that the choice of classes changes the appearance of a histogram. 
Drag back and forth until you get the histogram that you think best dis- 
plays the distribution. How many classes did you use? Why do you think 
this is best? 


Making a statistical graph is not an end in itself. The purpose of graphs is to help 
us understand the data. After you make a graph, always ask, “What do I see?” Once 
you have displayed a distribution, you can see its important features as follows. 


1.4 


Examining a Histogram 


In any graph of data, look for the overall pattern and for striking deviations from 
that pattern. 

You can describe the overall pattern of a histogram by its shape, center, and variability. 
You will sometimes see variability referred to as spread. 

An important kind of deviation is an outlier, an individual value that falls outside 
the overall pattern. 


One way to describe the center of a distribution is by its midpoint, the value with 
roughly half the observations taking smaller values and half taking larger values. 
To find the midpoint, order the observations from smallest to largest, making sure 
to include repeated observations as many times as they appear in the data. First 
cross off the largest and smallest observations, then the largest and smallest of those 
remaining, and continue this process. If there were an odd number of observations 
initially, you will be left with a single observation, which is the midpoint. If there 
were an even number of observations initially, you will be left with two observa- 
tions, and their average is the midpoint. 

For now, we will describe the variability of a distribution by giving the smallest 
and largest values. We will learn better ways to describe center and variability in 
Chapter 2. The overall shape of a distribution can often be described in terms of 
symmetry or skewness, defined as follows. 


Symmetric and Skewed Distributions 


A distribution is symmetric if the right and left sides of the histogram are approxi- 
mately mirror images of each other. 


A distribution is skewed to the right if the right side of the histogram (containing 
the half of the observations with larger values) extends much farther out than the left 
side. It is skewed to the left if the left side of the histogram extends much farther 
out than the right side. 


Interpreting Histograms 


Look again at the histogram in Figure 1.5. To describe the distribution, we want to 
look at its overall pattern and any deviations. 


Shape: The distribution has a single peak, which represents states in which 
between 80% and 85% of students graduate high school on time. The distribution 
is skewed to the left. A majority of states have more than 75% of students graduat- 
ing high school on time, but several states have much lower percents, so the graph 
extends quite far to the left of its peak. 


Center: Arranging the observations from Table 1.1 in order of size shows that 80% 
is the midpoint of the distribution. There are a total of 48 observations, and if we cross 
off the 23 highest graduation rates and the 23 lowest graduation rates, we are left with 
two graduation rates, both of which are 80%. The center is their average, which is 80%. 


Variability: The graduation rates range from 59% to 88%, which shows consider- 
able variability in graduation rates among the states. 


Outliers: Figure 1.5 shows no observations outside the overall single-peaked, 
left-skewed pattern of the distribution. Figure 1.6 is another histogram of the same 
distribution, with classes of width 3% rather than 5%. Now there are three states that 
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FIGURE 1.6 

Another histogram of the distribution 
of the percent of on-time high school 
graduates, with narrower class widths 
than in Figure 1.5. Histograms with 
more classes show more detail but 
may have a less clear pattern. 


Number of states 


58 64 70 76 82 88 
Percent of on-time HS graduates 


stand a bit apart to the left of the rest of the distribution: the District of Columbia 
at 58%, Nevada at 62%, and New Mexico at 63%. Are these states outliers or just 
the smallest observations in a strongly skewed distribution? Unfortunately, there is 
no rule. Let’s agree to call attention to only strong outliers that suggest something 
special about an observation—or an error such as typing 10.1 as 101. These states 
do not appear to be strong outliers. m 


Figures 1.5 and 1.6 remind us that interpreting graphs calls for judgment. 

We also see that the choice of classes in a histogram can influence the appear- 

ance of a distribution. Because of this, and to avoid worrying about minor 
details, concentrate on the main features of a distribution that persist with several 
choices of class intervals. Look for major peaks, not for minor ups and downs, in the 
bars of the histogram. For example, don’t immediately conclude that Figure 1.6 
shows two important peaks, one between 76% and 79% and the second between 
82% and 85%. When you choose a larger number of class intervals, the histogram 
can become more jagged, leading to the appearance of multiple peaks that are close 
together. If Arizona, Delaware, and North Carolina had graduation rates of 79% 
instead of 78%, these states would have been in the class interval 79% to <82% 
rather than the class interval from 76% to <79%. This small change would have 
eliminated the second peak between 76% and 79% in Figure 1.6, leaving only one 
peak as in Figure 1.5. Be sure to check for clear outliers, not just for the smallest and 
largest observations, and look for rough symmetry or clear skewness. 

Here are more examples of describing the overall pattern of a histogram. 


IOWATEST 


Figure 1.7 displays the scores of all 947 seventh-grade students in the public 
schools of Gary, Indiana, on the vocabulary part of the lowa Tests of Basic Skills.° 
The distribution is single-peaked and symmetric. In mathematics, the two sides of 
symmetric patterns are exact mirror images. Real data are almost never exactly sym- 
metric. We are content to describe Figure 1.7 as symmetric. The center (half above, 
half below) is close to 7. This is seventh-grade reading level. The scores range from 
2.0 (second-grade level) to 12.1 (twelfth-grade level). 
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Notice that the vertical scale in Figure 1.7 is not the count of students but the percent of 
students in each histogram class. A histogram of percents rather than counts is convenient 
when we want to compare several distributions. To compare Gary with Los Angeles, a much 
bigger city, we would use percents so that both histograms have the same vertical scale. m 


Percent of seventh-grade students 


FIGURE 1.7 

Histogram of the lowa Tests vocabulary 
scores of all seventh-grade students in 
Gary, Indiana, for Example 1.6. This 
distribution is single peaked and 
symmetric. 


2 4 6 8 10 R 
lowa Test vocabulary score 


Depending on where you went to high school, the answer to this question may be a 
“almost everybody” or “almost nobody.” Figure 1.8 is a histogram of the percent of suit 
high school graduates in each state who took the SAT test in 2013.'° 


SATPCT 
Two peaks suggest that 
the data include two types 
of states. 
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FIGURE 1.8 
Histogram of the percent of high school 
graduates in each state who took the 
SAT Reasoning test, for Example 1.7. 
The graph shows two groups of states: 


0 20 40 60 80 100 ACT states (where few students take 
Percent of high school graduates who took the SAT the SAT) at the left and SAT states at 
the right. 


Kallista Images/SuperStock 
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The histogram shows two peaks: a high peak at the left and a lower but broader peak 
centered in the 60% to <80% class. The presence of more than one peak suggests that 
a distribution mixes several kinds of individuals. That is the case here. There are two major 
tests of readiness for college, the ACT and the SAT. Most states have a strong preference 
for one or the other. In some states, many students take the ACT exam and few take the 
SAT—these states form the peak on the left. In other states, many students take the SAT 
and few choose the ACT—these states form the broader peak at the right. 

Giving the center and variability of this distribution is not very useful. The midpoint 
falls in the 40% to <60% class, between the two peaks. The story told by the histo- 
gram is in the two peaks corresponding to ACT states and SAT states. m 


The overall shape of a distribution is important information about a variable. 
Some variables have distributions with predictable shapes. Many biological mea- 
surements on specimens from the same species and sex—lengths of bird bills, 
heights of young women—have symmetric distributions. On the other hand, data 
on people’s incomes are usually strongly skewed to the right. There are many moder- 
ate incomes, some large incomes, and a few enormous incomes. Many distributions 
have irregular shapes that are neither symmetric nor skewed. Some data show other 
patterns, such as the two peaks in Figure 1.8. Use your eyes, describe the pattern 
you see, and then try to explain the pattern. 


Apply Your Knowledge 


FIGURE 1.9 

Histogram of the ages of infected 
individuals with Lyme disease for cases 
reported between 2001 and 2010 in 
the United States, for males and females, 
for Exercise 1.9. 


1.8 Foreign Born. In Exercise 1.6, you made a histogram of the percent of for- 
eign-born residents in each of the 50 states and the District of Columbia, given 
in Table 1.2. Describe the shape of the distribution. Is it closer to symmetric or 
skewed? What is the center (midpoint) of the data? What is the variability in 
terms of the smallest and largest values? Are there any states with an unusually 
large or small percent of foreign-born residents? FOREIGN 


1.9 Lyme Disease. Lyme disease is caused by a bacteria called Borrelia burg- 
dorferi and is spread through the bite of an infected black legged tick, gener- 
ally found in woods and grassy areas. There were 213,515 confirmed cases 


reported to the Centers for Disease Control (CDC) between 2001 and 2010, 
and these are broken down by age and sex in Figure 1.9."! 


Confirmed Lyme disease cases by age and sex—United States, 2001-10 
25,000 


20,000 4 : i Male i Female 


Cases 


Q@ 5 io 15 20 2 30 35 40 45 50 55 CO @ 70) 7H HO) & FO 
Age(years) 


Reported cases of Lyme disease are most common among boys aged 5-9. 
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Here is how Figure 1.9 relates to what we have been studying. The indi- 
viduals are the 213,515 people with confirmed cases, and two of the variables 
measured on each individual are sex and age. Considering males and females 
separately, we could draw a histogram of the variable age using class intervals 0 
to <5 years old, 5 to <10 years old, and so forth. Look at the leftmost two bars 
in Figure 1.9. The light blue bar shows that approximately 9000 of the males 
were between 0 and 5 years old, and the dark blue bar shows slightly fewer 
females were in this age range. If we were to take all the light blue bars and put 
them side by side, we would have the histogram of age for males using the class 
intervals stated. Similarly the dark blue bars show females. Because we are try- 
ing to display both histograms in the same graph, the bars for males and females 
within each class interval have been placed alongside each other for ease of 
comparison, with the bars for different class intervals separated by small spaces. 


(a) Describe the main features of the distribution of age for males. Why would 
describing this distribution in terms of only the center and variability be 
misleading? 

(b) Suppose that different age groups of males spend differing amounts of time 
outdoors. How could this fact be used to explain the pattern that you 
found in part (a)? Remember to use your eyes to describe the pattern you 
see, and then try to explain the pattern. 

(c) A 45-year-old male friend of yours looks at the histogram and tells you that 
he is planning on giving up hiking because this graph suggests he is in a 
high-risk group for getting Lyme disease. He will resume hiking when he is 
65, as he will be less likely to get Lyme disease at that age. Is this a correct 
interpretation of the histogram? 

(d) Comparing the histograms for males and females, how are they similar? 
What is the main difference, and why do you think it occurs? 


Histograms are not the only graphical display of distributions. For small data sets, 
a stemplot is quicker to make and presents more detailed information. 


To make a stemplot: 


1. Separate each observation into a stem, consisting of all but the final (rightmost) 
digit, and a leaf, the final digit. Stems may have as many digits as needed, but each 
leaf contains only a single digit. 

2. Write the stems in a vertical column with the smallest at the top, and draw a verti- 
cal line at the right of this column. Be sure to include all the stems needed to span 
the data, even when some stems will have no leaves. 


3. Write each leaf in the row to the right of its stem, in increasing order out from the 
stem. 


States. To make a stemplot of these data, take the whole-number part of the percent 
as the stem and the final digit (tenths) as the leaf. Write stems from 1 for West Virginia 
to 27 for California. Now add leaves. Nevada, 19.2%, has leaf 2 on the 19 stem. 


Table 1.2 presents the percents of state residents who were born outside the United “ly 


FOREIGN 
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Florida, at 19.4%, places leaf 4 on the same stem. These are the only observations 
on this stem. Arrange the leaves in order, so that 19] 24 is one row in the stemplot. 
Figure 1.10 is the complete stemplot for the data in Table 1.2. m 


1 3 
2 03499 
3 334899 
4 1336778 
5 |3599 
6 1227 
7 34 
& | 46 
9 |567 
10 2 
11 1 
12 
13 |3445679 
14 9 
15 
16 | 5 
17 
168 | 2 
19 |24 
20 This 19 stem contains 
21/35 the values 19.2 for 
22 | 2 Nevada and 19.4 for 
23 Florida. 
FIGURE 1.10 
Stemplot of the percents of foreign-born ot 
residents in the states, for Example 1.8. 25 
Each stem is a percent, and leaves are 1 


tenths of a percent. 


ay THE VITAL 
FEW 


Skewed distributions can show us 


where to concentrate our efforts. 
Ten percent of the cars on the road 
account for half of all carbon dioxide 
emissions. A histogram of CO, 
emissions would show many cars with 
small or moderate values and a few 
with very high values. Cleaning up 
or replacing these cars would reduce 
pollution at a cost much lower than 
that of programs aimed at all cars. 
Statisticians who work at improving 
quality in industry make a principle 
of this: Distinguish “the vital few” 
from “the trivial many.” 


rounding 


A stemplot looks like a histogram turned on end, with the stems corresponding 
to the class intervals. The first stem in Figure 1.10 contains all states with percents 
between 1.0% and 1.9%. In this example, the stemplot is like a histogram with 
many classes. Compare the stemplot to Figure 1.11 (page 31), which is a histogram 
of the same data using class intervals 1.0% to <2.0%, 2.0% to <3.0%, and so forth. 
Although Figures 1.10 and 1.11 display exactly the same pattern, the stemplot, 
unlike the histogram, preserves the actual value of each observation. 

In a stemplot, the classes (the stems of a stemplot) are given to you. 
A Histograms are more flexible than stemplots because you can choose the 
classes more easily. In both Figures 1.10 and 1.11, California (27.1%) stands 
slightly apart from the long right tail of the skewed distribution. Stemplots do not work 
well for large data sets, where each stem must hold a large number of leaves. Don’t try to 
make a stemplot of a large data set, such as the 947 Iowa Tests scores in Figure 1.7. 

When there are too many stems, there are often no leaves or just one or two 
leaves on many of the stems as in Figure 1.10. The number of stems can be reduced 
by first rounding the data. In this example, we can round the data for each state to 
the nearest percent before drawing the stemplot. Here is the result: 


0) 1222223333334444444555566667788999 
1)01333333346699 
2,127 
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Histogram of the percents of foreign-born residents in the states, for Example 1.8. The class widths have been chosen 


to agree with the widths of the stems in the stemplot in Figure 1.10. 


Now it seems that there are too few stems. Splitting stems in a stemplot will dou- 
ble the number of stems when all the leaves would otherwise fall on just a few stems, 
as occurred when we rounded to the nearest percent. Each stem then appears twice. 
Leaves 0 to 4 go on the upper stem, and leaves 5 to 9 go on the lower stem. If you 
split the stems with the data rounded to the nearest percent, the stemplot becomes 


0,;1222223333334444444 


0|;555566667768999 
1)/0133333334 
1|/6899 

2/12 

2/7 


which makes the right skew pattern clearer. Rounding and splitting stems are 

matters for judgment, like choosing the classes in a histogram. Some data require 

rounding but don’t require splitting stems, some require just splitting stems, and 

other data require both. The One-Variable Statistical Calculator applet on the text 

website allows you to decide whether to split stems so that it is easy to see the effect. 
Comparing Figures 1.11 (right-sskewed) and 1.5 (left-skewed) reminds us 

A that the direction of skewness is the direction of the long tail, not the direction 
where most observations are clustered. 


(5) LaunchPad Online Resources 


* The Snapshots video, Visualizing Quantitative Data, illustrates the ideas of 
both stemplots and histograms. 


* The StatBoards videos, Creating and Interpreting a Histogram and Creating 
and Interpreting a Stemplot, provide the details of constructing both stemplots 
and histograms through an example. 


splitting stems 


© 


31 


32. CHAPTER 1 @ Picturing Distributions with Graphs 


Apply Your Knowledge 


1.10 Graduation Rates. Table 1.1 presents the data for the on-time high 
school graduation rates in each of the 50 states and the District of 
Columbia. fijj) GRADRATE 


(a) Make a stemplot of the data as it appears in Table 1.1. Because the 
stemplot preserves the actual value of the observations, it is easy to 
find the midpoint and the variability. What are they? 


(b) For this data, splitting the stems gives a clearer picture of the shape of 
the distribution. Make a second stemplot splitting the stems, placing 
leaves 0 to 4 on the first stem and leaves 5 to 9 on the second stem 
of the same value. How does the resulting stemplot compare to the 
histogram in Figure 1.5? 


1.11 Health Care Spending. Table 1.3 shows the 2011 per capita total expen- 
diture on health in 35 countries with the highest gross domestic product 
in 2011." Health expenditure per capita is the sum of public and private 
health expenditure (in PPP international $) divided by population. Health 
expenditures include the provision of health services, family-planning ac- 
tivities, nutrition activities, and emergency aid designated for health, but 
exclude the provision of water and sanitation. Make a stemplot of the data 
after rounding to the nearest $100 (so that stems are thousands of dollars 
and leaves are hundreds of dollars). Split the stems, placing leaves 0 to 4 on 
the first stem and leaves 5 to 9 on the second stem of the same value. De- 
scribe the shape, center, and variability of the distribution. Which country 


is the high outlier? HEALTH 


COUNTRY DOLLARS COUNTRY DOLLARS COUNTRY DOLLARS 
Argentina 1434 India 141 Saudi Arabia 901 
Australia 3692 Indonesia 127 South Africa 943 
Austria 4482 Iran 929 Spain 3041 
Belgium 4119 Italy 3130 Sweden 3870 
Brazil 1043 Japan 3174 Switzerland 5564 
Canada 4520 Korea, South 2181 Thailand 353 
China 432 Malaysia 559 Turkey 1161 
Colombia 618 Mexico 940 United Arab Emirates 1732 
Denmark 4564 Netherlands 5123 United Kingdom 3322 
France 4086 Norway 5674 United States 8608 
Germany 4371 Poland 1423 Venezuela 659 
Greece 2918 Russia 1316 


Many variables are measured at intervals over time. We might, for example, mea- 
sure the height of a growing child or the price of a stock at the end of each month. 
In these examples, our main interest is change over time. To display change over 
time, make a time plot. 
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A time plot of a variable plots each observation against the time at which it was mea- 
sured. Always put time on the horizontal scale of your plot and the variable you are 
measuring on the vertical scale. Connecting the data points by lines helps emphasize 
any change over time. 


Water levels in Everglades National Park are critical to the survival of this unique 

region. The photo shows a water-monitoring station in Shark River Slough, the main ely 
path for surface water moving through the “river of grass” that is the Everglades. 
Each day the mean gauge height, the height in feet of the water surface above the 
gauge datum, is measured at the Shark River Slough monitoring station. (The gauge 
datum is a vertical control measure established in 1929 and is used as a reference 
for establishing varying elevations. It establishes a zero point from which to mea- 
sure the gauge height.) Figure 1.12 is a time plot of mean daily gauge height at 
this station from January 1, 2000, to December 31, 2012.'? m 


WATERLEV 


When you examine a time plot, look once again for an overall pattern and for 
strong deviations from the pattern. Figure 1.12 shows strong cycles, regular up-and- _ cycles 
down movements in water level. The cycles show the effects of Florida’s wet season 
(roughly June to November) and dry season (roughly December to May). Water 
levels are highest in late fall. If you look closely, you can see the year-to-year varia- 
tion. The dry season in 2003 ended early, with the first-ever April tropical storm. 
In consequence, the dry-season water level in 2003 did not dip as low as in other 
years. The drought in the southeastern portion of the country in 2008 and 2009 
shows up in the steep drop in the mean gauge height in 2009, whereas the lower 
peaks in 2006 and 2007 reflect lower water levels during the wet seasons in these 
years. Finally, in 2011, an extra-long dry season and a slow start to the 2011 rainy 
season compounded into the worst drought in the southwest Florida area in 80 
years, which shows up as the steep drop in the mean gauge height in 2011. 


Courtesy U.S. Geological Survey 


~~ 04 
7) 
€ 
» H \ 
= Nf 
Ss) 
v 
= 
% 
= 
c 
—) 
Zz ws 
‘a 
s 
§ asa Water levels at lowest 
s values on June 14 FIGURE 1.12 
and June 15, 2011 1; ae heigh 
al ime plot of average gauge height 
: : : : ; : : : ; : ; : : at a monitoring station in Everglades 
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 National Park over a twelve-year period, 


Year for Example 1.9. The yearly cycles reflect 
Florida's wet and dry seasons. 
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frend Another common overall pattern in a time plot is a trend, a long-term upward or 
downward movement over time. Many economic variables show an upward trend. 
Incomes, house prices, and (alas) college tuitions tend to move generally upward 
over time. 
Histograms and time plots give different kinds of information about a variable. 
time series data ‘The time plot in Figure 1.12 presents time series data that show the change in 
cross-sectional data _ water level at one location over time. A histogram displays cross-sectional data, 


such as water levels at many locations in the Everglades at the same time. 


Apply Your Knowledge 


1.12 The Cost of College. Here are data on the average tuition and fees charged 
to in-state students by public four-year colleges and universities for the 1980 
to 2013 academic years. Because almost any variable measured in dollars 
increases over time due to inflation (the falling buying power of a dollar), 
the values are given in “constant dollars,” adjusted to have the same buying 
power that a dollar had in 2013." COLLEGEX 


Year Tuition Tuition Year Tuition | Year Tuition 
1980 $2274 1998 $4656 2007 $6943 
1981 $2321 1990 $3424 1999 $4719 2008 $7008 
1982 $2475 1991 $3621 2000 $4751 2009 $7672 
1983 $2689 1992 $3887 2001 $4966 2010 $8174 
$2761 1993 $4108 2002 = $5324 2011 $8557 


(a) Make a time plot of average tuition and fees. 


(b) What overall pattern does your plot show? 


(c) Some possible deviations from the overall pattern are outliers, periods 
when charges went down (in 2013 dollars) and periods of particularly 
rapid increase. Which are present in your plot, and during which years? 


(d) In looking for patterns, do you think that it would be better to study a 
time series of the tuition for each year or the percent increase for each 


year? Why? 


CHAPTER 1 SUMMARY 


Chapter Specifics 


a A data set contains information on a number of individuals. Individuals may be people, 
animals, or things. For each individual, the data give values for one or more variables. 
A variable describes some characteristic of an individual, such as a person’s height, sex, 


or salary. 


m Some variables are categorical, and others are quantitative. A categorical variable 
places each individual into a category, such as male or female. A quantitative variable 
has numerical values that describe some characteristic of each individual using a unit 
of measurement, such as height in centimeters or salary in dollars. 


m= Exploratory data analysis uses graphs and numerical summaries to describe the vari- 
ables in a data set and the relations among them. 


m After you understand the background of your data (individuals, variables, units of 
measurement), almost always the first thing to do is plot your data. 


a The distribution of a variable describes what values the variable takes and how often 
it takes these values. Pie charts and bar graphs display the distribution of a categorical 
variable. Bar graphs can also compare any set of quantities measured in the same units. 
Histograms and stemplots graph the distribution of a quantitative variable. 


m When examining any graph, look for an overall pattern and for notable deviations 
from the pattern. 


m Shape, center, and variability (or spread) describe the overall pattern of the distribu- 
tion of a quantitative variable. Some distributions have simple shapes, such as sym- 
metric or skewed. Not all distributions have a simple overall shape, especially when 
there are few observations. 


m Outliers are observations that lie outside the overall pattern of a distribution. Always 
look for outliers, and try to explain them. 


m When observations on a variable are taken over time, make a time plot that graphs 
time horizontally and the values of the variable vertically. A time plot can reveal 
trends, cycles, or other changes over time. 


Link It 


Practical statistics uses data to draw conclusions about some broader universe. You 
should reread Example 1.1, as it will help you understand this basic idea. For the Ameri- 
can Community Survey described in the example, the data are the responses from those 
households responding to the survey, although the broader universe of interest is the 
entire nation. 

In our study of practical statistics, we will divide the subject into three main areas. In 
exploratory data analysis, graphs and numerical summaries are used for exploring, organiz- 
ing, and describing data so that the patterns become apparent. Data production concerns 
where the data come from and helps us to understand whether what we learn from our data 
can be generalized to a wider universe. And statistical inference provides tools for general- 
izing what we learn to a wider universe. 

In this chapter we have begun to learn about data analysis. A data set can consist of 
hundreds of observations on many variables. Even if we consider only one variable at a 
time, it is difficult to see what the data have to say by scanning a list containing many data 
values. Graphs provide a visual tool for organizing and identifying patterns in data and are 
a good starting point in the exploration of the distribution of a variable. 

Pie charts and bar graphs can summarize the information in a categorical variable by giv- 
ing us the percent of the distribution in the various categories. Although a table containing 
the categories and percent gives the same information as a bar graph, a substantial advantage 
of the bar graph over a tabular presentation is that the bar graph allows us to visually compare 
percents among all categories simultaneously by means of the heights of the bars. 

Histograms and stemplots are graphical tools for summarizing the information pro- 
vided by a quantitative variable. The overall pattern in a histogram or stemplot illustrates 
some of the important features of the distribution of a variable that will be of interest as 
we continue our study of practical statistics. The center of the histogram tells us about 
the value of a “typical” observation on this variable, whereas the variability gives us a 
sense of how close most of the observations are to this value. Other interesting features 


Summary 
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are the presence of outliers and the general shape of the plot. For data collected over 
time, time plots can show patterns such as seasonal variation and trends in the variable. 
In the next chapter we will see how the information about the distribution of a variable 
can also be described using numerical summaries. 


() LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


* StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 


CHECK YOUR SKILLS 


The multiple-choice exercises in Check Your Skills ask straightfor- 
ward questions about basic facts from the chapter. Answers to all 
these exercises appear in the back of the book. You should expect 
almost all your answers to be correct. 


1.13 


Here are the first lines of a professor’s data set at the 
end of a statistics course: 


Name Major Total Points Grade 
ADVANI, SURA COMM SS)7/ B 
BARTON, DAVID HIST SAS iG 
BROWN, ANNETTE BIOL 446 A 
CHIU, SUN PSYC 405 B 
CORTEZ, MARIA PSYC 461 A 


The individuals in these data are 


(a) the students. 

(b) the total points. 

(c) the course grades. 

To display the distribution of grades (A, B, C, D, F) for 
all students in the course, it would be correct to use 


(a) a pie chart but not a bar graph. 
(b) a bar graph but not a pie chart. 
(c) either a pie chart or a bar graph. 


A description of different houses on the market in- 
cludes the variables square footage of the house and 
average monthly gas bill. 


(a) Square footage and average monthly gas bill are 
both categorical variables. 

(b) Square footage and average monthly gas bill are 
both quantitative variables. 

(c) Square footage is a categorical variable, and 
average monthly gas bill is a quantitative variable. 


A political party’s data bank includes the zip codes of 
past donors, such as 
47906 34236 53075 
30304 99709 


Zip code is a 


10010 90210 75204 


(a) quantitative variable. 
(b) categorical variable. 
(c) unit of measurement. 


Figure 1.6 is a histogram of the percent of on-time 
high school graduates in each state. The leftmost 
bar in the histogram covers percents of on-time high 
school graduates ranging from about 


(a) 58% to 64%. 
(b) 58% to 61%. 
(c) 0% to 58%. 


Here are the exam scores of 10 students in a statistics 
class: 


50) 35) lk) OP OLS} 
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1.20 
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To make a stemplot of these data, you would use stems 
(a) 2,3, 4, 5, 6, 7, 9. 

((s)) 2, 3, 4), 5h, Oy tly yO) 

(c) 20, 30, 40, 50, 60, 70, 80, 90. 

Where do students go to school? Although 81.4% of 
first time first-year students attended college in the 
state in which they lived, this percent varied consider- 
ably over the states. Here is a stemplot of the percent 
of first-year students in each of the 50 states who were 
from the state where they enrolled. The stems are 10s 
and the leaves are 1s. The stems have been split in the 
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The midpoint of this distribution is 
(a) 65% 
(b) 80% 
(c) 83% 


The shape of the distribution in Exercise 1.19 is 


(a) clearly skewed to the left with two low outliers. 
(b) roughly symmetric. 
(c) clearly skewed to the right with two low outliers. 


The state with the largest percent of first-year students 
enrolled in the state has 


(a) 91.1% enrolled. 
(b) 92% enrolled. 
(c) 92.22% enrolled. 


You look at real estate ads for houses in Naples, Florida. 
There are many houses ranging from $200,000 to 
$500,000 in price. The few houses on the water, how- 
ever, have prices up to $15 million. The distribution of 
house prices will be 

(a) skewed to the left. 

(b) roughly symmetric. 

(c) skewed to the right. 
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Medical students. Students who have finished 
medical school are assigned to residencies in hospi- 
tals to receive further training in a medical specialty. 
Here is part of a hypothetical database of students 
seeking residency positions. USMLE is the student’s 
score on Step 1 of the national medical licensing 
examination. 


NAME 


SPECIALTY 
SOUGHT 


MEDICAL 


SCHOOL SEX AGE  USMLE 


Abrams, 
Laurie 


Family 


Florida F 28 238 medicine 


Brown, 


Gordon 


Meharry M 25 205 Radiology 


Cabrera, 
Maria 


Tufts F 26 191 Pediatrics 


Ismael, 


Miranda 


Internal 
medicine 


Indiana F Bz 245 
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(a) What individuals does this data set describe? 
(b) In addition to the student’s name, how many 
variables does the data set contain? Which of 
these variables are categorical, and which are quan- 
titative? If a variable is quantitative, what units is it 
measured in? 


Buying a refrigerator. Consumer Reports will have 
an article comparing refrigerators in the next issue. 
Some of the characteristics to be included in the re- 
port are the brand name and model; whether it has 
a top, bottom, or side-by-side freezer; the estimated 
energy consumption per year (kilowatts); whether or 
not it is Energy Star compliant; the width, depth, and 
height in inches; and both the freezer and refrigerator 
net capacity in cubic feet. Which of these variables 
are categorical, and which are quantitative? Give the 
units for the quantitative variables and the categories 
for the categorical variables. What are the individuals 
in the report? 


What color is your car? The most popular colors 
for cars and light trucks vary with region and over 
time. In North America white remains the top color 
choice, with silver the top choice in South Amer- 
ica and white the top choice worldwide for the 
third consecutive year. Here is the distribution of 
the top colors for vehicles sold globally in 2013:'¢ 
jy,| CARCOLOR 


COLOR POPULARITY 
White 25% 
Black 18% 
Silver 18% 
Gray 12% 
Red 9% 
Beige, brown 8% 
Blue 71% 


Other colors 


Fill in the percent of vehicles that are in other colors. Make 
a graph to display the distribution of color popularity. 


Facebook, Twitter, and LinkedIn users. After years of 
explosive growth in number of users of social network- 
ing sites in all age ranges and demographics, it is hard 
to argue that social media haven't changed forever 
how we interact and connect online. Although Face- 
book is still the dominant player in social networking, 
both Twitter and LinkedIn have continued to increase 
their usage. Here is the age distribution of the users for 
the three sites in 2013: By)) sociant 


FACEBOOK TWITTER LINKEDIN 

AGE GROUP USERS USERS USERS 

13 to 17 years 10% 10% 4% 
18 to 24 years 14% 18% 10% 
25 to 34 years 19% 22% 20% 
35 to 44 years 17% 17% 18% 
45 to 54 years 17% 15% 20% 
55 to 64 years 13% 11% 17% 
Over 65 years 10% 7% 11% 


(a) Draw a bar graph for the age distribution of 
Facebook users. The leftmost bar should correspond 
to “13 to 17,” the next bar to “18 to 24,” and so on. 
Do the same for Twitter and LinkedIn, using the 
same scale for the percent axis. 

(b) Describe the most important difference in the age dis- 
tribution of the audience for these three social networking 
sites. How does this difference show up in the bar graphs? 
Do you think it was important to order the bars by age to 
make the comparison easier? Why or why not? 

(c) Explain why it is appropriate to use a pie chart to 
display any of these distributions. Draw a pie chart for 
each distribution. Do you think it is easier to compare 
the three distributions with bar graphs or pie charts? 
Explain your reasoning. 
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1.29 Canadian students rate their universities. 


Deaths among young people. Among persons aged 15 
to 24 years in the United States, the leading causes of 
death and number of deaths in 2011 were: accidents, 
12,032; suicide, 4688; homicide, 4508; cancer, 1609; 
heart disease, 948; congenital defects, 429.!8 


(a) Make a bar graph to display these data. 
(b) To make a pie chart, you need one additional 
piece of information. What is it? 


Hispanic origins. According to the 2010 U.S. Census, 
308.7 million people resided in the United States on 
April 1, 2010, of which 50.5 million (or 16%) were of 
Hispanic origin. What countries do they come from? 
Figure 1.13 is a pie chart to show the country of ori- 
gin of Hispanics in the United States in 2010.’ About 
what percent of Hispanics are Mexican? Puerto Rican? 
You see that it is hard to determine numbers from a pie 
chart. Bar graphs are much easier to use. (The U.S. 
Census Bureau includes the percents in many of its pie 
charts.) 


Percent Distribution of Hispanics 
by Origin: 2010 


Puerto Rican 
Cuban 


Central American 


South American 


Mexican Other Hispanic 


FIGURE 1.13 
Pie chart of the national origins of Hispanic residents 
of the United States, for Exercise 1.28. 


The Na- 


tional Survey of Student Engagement asked students 


UNIVERSITY EXCELLENT RATING 
Toronto 21% 
York 18% 
Alberta 23% 
Ottawa 11% 
Western Ontario 38% 
British Columbia 18% 
Calgary 14% 
McGill 26% 
Waterloo 36% 
Concordia 21% 
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10 


Number of subjects 
5 
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at many universities, “How would you evaluate your 
entire educational experience at this university? ” 
Here are the percents of senior-year students at Cana- 
da’s 10 largest primarily English-speaking universities 


who responded “Excellent”:”° 


(a) The list is arranged in order of undergraduate 
enrollment. Make a bar graph with the bars in order 
of student rating. 

(b) Explain carefully why it is not correct to make a 
pie chart of these data. 


Do adolescent girls eat fruit? We all know that 
fruit is good for us. Many of us don’t eat enough. 
Figure 1.14 is a histogram of the number of servings 
of fruit per day claimed by 74 seventeen-year-old girls 
in a study in Pennsylvania.”! Describe the shape, cen- 
ter, and variability of this distribution. Are there any 
outliers? What percent of these girls ate six or more 
servings per day? How many of these girls ate fewer 
than two servings per day? 


2 3 A §S§ G6 F 8 
Servings of fruit per day 


[o} 
KH 


FIGURE 1.14 


The dist 
girls, for 
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ribution of fruit consumption in a sample of 74 seventeen-year-old 
Exercise 1.30. 


IQ test scores. Figure 1.15 (see page 40) is a stemplot 
of the IQ test scores of 78 seventh-grade students in a 
rural midwestern school.” £y)) 


(a) Four students had low scores that might be 
considered outliers. Ignoring these, describe the 
shape, center, and variability of the remainder of 
the distribution. 

(b) We often read that IQ scores for large populations 
are centered at 100. What percent of these 78 students 
have scores above 100? 
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GP || Dc (b) What is the approximate center of this distribu- 

U | Ve tion? (For now, take the center to be the value with 

8 roughly half the months having lower returns and 
38/169 half having higher returns.) 

910133 (c) Approximately what were the smallest and largest 
OMGwA7aS monthly returns, leaving out the outliers? (This is one 
10 |0022333344 way to describe the variability of the distribution.) 
10 | 555666777789 (d) A return less than zero means that stocks lost 
1110000111122223334444 value in that month. About what percentage of all 
1155688999 months had returns less than zero? 

12 |003344 1.33 Name that variable. A survey of a large college class 
12|677888 asked the following questions: 
13 | 02 1. Are you female or male? (In the data, male = 0 
13 | 6 female = 1.) 


2. Are you right-handed or left-handed? (In the 
FIGURE 1.15 data, right = 0, left = 1.) 
The distribution of |Q scores for 78 seventh-grade 3. What is your height in inches? 
tudents, for Exercise 1.31. ‘ : : 
eel seis 4. How many minutes do you study on a typical 


; weeknight? 
1.32 Returns on common stocks. The return on a stock is 8 


the change in its market price plus any dividend pay- 
ments made. Total return is usually expressed as a percent 
of the beginning price. Figure 1.16 is a histogram of the 
distribution of the monthly returns for all stocks listed on 
US. markets from January 1985 to December 2013 (348 
months).” The extreme low outlier is the market crash 


of October 1987, when stocks lost 23% of their value in 

one month. The other two low outliers are 16% during 

August 1998, a month when the Dow Jones Industrial 

Average experienced its second-largest drop in history to 

that time, and the financial crisis in October 2008, when 

stocks lost 17% of their value. £yj,) stockret 

(a) Ignoring the outliers, describe the overall shape 

of the distribution of monthly returns. I q 


Figure 1.17 shows histograms of the student responses, 
in scrambled order and without scale markings. 
Which graph goes with each variable? Explain your 
reasoning. 


je) 
8 4 
oF 
Oo 
23] 
t= ie) J 
5s FIGURE 1.17 
2 Sy Histograms of four distributions, for Exercise 1.33. 
So] 
= 
3 o} 1.34 Food oils and health. Fatty acids, despite their un- 
5 a pleasant name, are necessary for human health. Two 
22. types of essential fatty acids, called omega-3 and ome- 
S14 ga-6, are not produced by our bodies and so must be 
of obtained from our food. Food oils, widely used in food 
i ee oe processing and cooking, are major sources of these 
Monthly percent return on common stocks compounds. There is some evidence that a healthy diet 
should have more omega-3 than omega-6. Table 1.4 
FIGURE 1.16 gives the ratio of omega-3 to omega-6 in some common 


1. 24 . 
The distribution of monthly percent returns on U.S. common stocks from food oils.* Values greater than 1 show that an oil has 


January 1985 to December 2013, for Exercise 1.32. more omega-3 than omega-6. 
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OIL RATIO OIL RATIO 

Perilla 533) Flaxseed 3.56 

Walnut 0.20 Canola 0.46 

Wheat germ ORB Soybean 0.13 

Mustard 0.38 Grape seed 0.00 

Sardine 2.16 Menhaden 1.96 

Salmon 2.50 Herring 2.67 

Mayonnaise 0.06 Soybean, hydrogenated 0.07 

Cod liver 2.00 Rice bran 0.05 

Shortening (household) 0.11 Butter 0.64 

Shortening (industrial) 0.06 Sunflower 0.03 

Margarine 0.05 Corn 0.01 

Olive 0.08 Sesame 0.01 

Shea nut 0.06 Cottonseed 0.00 

Sunflower (oleic) 0.05 Palm 0.02 

Sunflower (linoleic) 0.00 Cocoa butter 0.04 
(a) Make a histogram of these data, using classes 1.35 Where are the nurses? Table 1.5 gives the number of ac- 
bounded by the whole numbers from 0 to 6. tive nurses per 100,000 people in eachstate.” 2j))| NURSES 


(b) What is the shape of the distribution? How many 
of the 30 food oils have more omega-3 than omega-6? 
What does this distribution suggest about the possible 
health effects of modern food oils? 

(c) Table 1.4 contains entries for several fish oils 
(cod, herring, menhaden, salmon, sardine). How 


do these values support the idea that eating fish is 
healthy? 


(a) Why is the number of nurses per 100,000 people 

a better measure of the availability of nurses than a 
simple count of the number of nurses in a state? 

(b) Make a histogram that displays the distribution of 
nurses per 100,000 people. Write a brief description of 
the distribution. Are there any outliers? If so, can you 
explain them? 


STATE NURSES STATE NURSES STATE NURSES 
Alabama 912 Louisiana 894 Ohio 1001 
Alaska 756 Maine 1053 Oklahoma 712 
Arizona 544 Maryland 869 Oregon 795 
Arkansas 774 Massachusetts 1210 Pennsylvania 1017 
California 641 Michigan 841 Rhode Island 1007 
Colorado 761 Minnesota 1017 South Carolina 795 
Connecticut 994 Mississippi 868 South Dakota VAS 
Delaware 977 Missouri 958 Tennessee 894 
Florida 814 Montana 748 Texas 662 
Georgia 653 Nebraska 1010 Utah 625 
Hawaii 753 Nevada 574 Vermont 912 
Idaho 642 New Hampshire 970 Virginia 750 
Illinois 812 New Jersey 907 Washington 774 
Indiana 864 New Mexico 580 West Virginia 938 
Iowa 990 New York 859 Wisconsin 905 
Kansas 867 North Carolina 886 Wyoming 812 
Kentucky 923 North Dakota 1097 Dist. of Columbia 1380 


42 
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COUNTRY co, COUNTRY co, COUNTRY co, 

Algeria 3.3315 Indonesia 1.8032 Poland 8.3086 
Argentina 4.4710 Iran 7.6765 Russia e225) 
Bangladesh 0.3716 Iraq 3.7034 South Africa 9.2041 
Brazil 2.1503 Italy 6.7177 Spain 5.8535 
Canada 14.6261 Japan 9.1857 Sudan 0.3109 
China 6.1949 Kenya 0.3038 Tanzania Ob? 
Colombia 1.6295 Korea, South 11.4869 Thailand 4.4469 
Congo 0.4932 Mexico 3.7636 Turkey 4.1310 
Egypt 2.6228 Morocco 1.5994 Uganda 0.1113 
Ethiopia 0.0746 Myanmar 0.1732 Ukraine 6.6449 
France 5.5554 Nigeria 0.4941 United Kingdom MES 2olll 
Germany 9.1148 Pakistan 0.9321 United States 17.5642 
India 1.6662 Philippines 0.8731 Vietnam 1.7281 


Carbon dioxide emissions. Burning fuels in pow- 
er plants and motor vehicles emits carbon dioxide 
(CO;), which contributes to global warming. Table 
1.6 displays the 2010 CO, emissions per person from 
countries with populations of at least 30 million in that 


year.” el) CO2EMISS 


(a) Why do you think we choose to measure emis- 
sions per person rather than total CO, emissions for 
each country? 

(b) Make a stemplot to display the data of Table 1.6. 
The data will first need to be rounded (see page 30). 
What units are you going to use for the stems? The 
leaves? You should round the data to the units you 
are planning to use for the leaves before drawing the 
stemplot. Describe the shape, center, and variability 
of the distribution. Which countries are outliers? 


Fur seals on St. Paul Island. Every year hundreds of 
thousands of northern fur seals return to their haul- 
outs in the Pribilof Islands in Alaska to breed, give 
birth, and teach their pups to swim, hunt, and survive 
in the Bering Sea. U.S. commercial fur sealing opera- 
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tions continued on St. Paul until 1984, but despite a 
reduction in harvest, the population of fur seals has 
continued to decline. Possible reasons include climate 
shifts in the North Pacific, changes in the availability 
of prey, and new or increased interaction with com- 
mercial fisheries that increase mortality. Here are data 
on the estimated number of fur seal pups born on St. 
Paul Island (in thousands) from 1979 to 2012, where a 
dash indicates a year in which no data were collected:”’ 


PUPS BORN PUPS BORN 

YEAR (THOUSANDS) YEAR (THOUSANDS) 
1979) 245.93 1996 170.12 
1980 203.82 1997 = 
1981 OTA, 1998 Oa) 
1982 203.58 1999 = 
1983 165.94 2000 158.74 
1984 173.27 2001 = 
1985 182.26 2002 a et2 
1986 167.66 2003 = 
1987 171.61 2004 122.82 
1988 202-73) 2005 = 
1989 171.53 2006 109.96 
1990 201.30 2007 = 
1991 — 2008 102.67 
1992 182.44 2009 = 
1993 — 2010 94.50 
1994 192.10 2011 = 
1995 — 2012 96.83 
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Make a stemplot to display the distribution of pups 
born per year. Describe the shape, center, and vari- 
ability of the distribution. Are there any outliers? 
FURSEALS 

Nintendo and laparoscopic skills. In laparoscopic sur- 
gery, a video camera and several thin instruments are 
inserted into the patient’s abdominal cavity. The sur- 
geon uses the image from the video camera positioned 
inside the patient’s body to perform the procedure by 
manipulating the instruments that have been inserted. 
It has been found that the Nintendo Wii™ reproduces 
the movements required in laparoscopic surgery more 
closely than other video games with its motion-sensing 
interface. If training with a Nintendo Wii™ can im- 
prove laparoscopic skills, it can complement the more 
expensive training on a laparoscopic simulator. Forty- 
two medical residents were chosen, and all were tested 
ona set of basic laparoscopic skills. Twenty-one were se- 
lected at random to undergo systematic Nintendo Wii™ 
training for one hour a day, five days a week, for four 
weeks. The remaining 21 residents were given no Nin- 
tendo Wii™ training and asked to refrain from video 
games during this period. At the end of four weeks, all 
42 residents were tested again on the same set of lapa- 
roscopic skills. One of the skills involved a virtual gall 
bladder removal, with several performance measures 
including time to complete the task recorded. Here are 
the improvement (before — after) times in seconds after 
four weeks for the two groups:”° NINTENDO 


TREATMENT CONTROL 


281 134 


186 128 84 243 266545585) 229) 97, 
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145 110 32 90 46 —81 


71 


77 


144 68 61 44 


back-to- 
back 
stemplots 


(a) In the context of this study, what do the negative 
values in the data set mean? 

(b) Back-to-back stemplots can be used to compare 
the two samples. That is, use one set of stems with two 
sets of leaves, one to the right and one to the left of the 
stems. (Draw a line on either side of the stems to sepa- 
rate stems and leaves.) Order both sets of leaves from 
smallest at the stem to largest away from the stem. 
Complete the back-to-back stemplot started below. 
The data have been rounded to the nearest 10, with 
stems being 100s and leaves being 10s. The stems have 
been split. The first control observation corresponds to 
—80 and the next two to —30 and —10. 
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(c) Report the approximate midpoints of both groups. 
Does it appear that the treatment has resulted in a 
greater improvement in times than seen in the control 
group? (To better understand the magnitude of the 
improvements, note that the median time to complete 
the task on the first occasion was 11 minutes and 

40 seconds, using the times of all 42 residents.) 


Fur seals on St. Paul Island. Make a time plot of the 
number of fur seals born per year from Exercise 1.37. 
What does the time plot show that your stemplot in 
Exercise 1.37 did not show? When you have data col- 
lected over time, a time plot is often needed to under- 
stand what is happening. FURSEALS 


Marijuana and traffic a” Researchers in New 
Zealand interviewed 907 drivers at age 21. They had 
data on traffic accidents, and they asked the drivers 
about marijuana use. Here are data on the numbers 
of accidents caused by these drivers at age 19, broken 
down by marijuana use at the same age: 


MARIJUANA USE PER YEAR 


1-10 11-50 51+ 
NEVER TIMES TIMES TIMES 

Accidents caused 59 36 15 50 
Drivers 452 229 70 156 


1.41 


(a) Explain carefully why a useful graph must 
compare rates (accidents per driver) rather than 
counts of accidents in the four marijuana use classes. 
(b) Compute the accident rates in the four marijuana 
use classes. After you have done this, make a graph 
that displays the accident rate for each class. What do 
you conclude? (You can’t conclude that marijuana use 
causes accidents, because risk takers are more likely 
both to drive aggressively and to use marijuana.) 


Dates on coins. Sketch a histogram for a distribution 
that is skewed to the left. Suppose that you and your 
friends emptied your pockets of coins and recorded the 
year marked on each coin. The distribution of dates 
would be skewed to the left. Explain why. 
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In-state tuition 
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She sounds tall! Presented with recordings of a pair of 
people of the same sex speaking the same phrase, can 
a listener determine which speaker is taller simply from 
the sound of their voice? Twenty four young adults at 
Washington University listened to 100 pairs of speak- 
ers, and within each pair were asked to indicate which 
of the two speakers was the taller. Here are the number 
correct (out of 100) for each of the 24 participants:*° 
#1),| HEARING 


65 61 67 59 58 62 56 67 61 67 63 53 
68 49 66 58 69 70 65 56 68 56 58 70 


Researchers believe that the key to correct discrimi- 
nation is contained in a particular type of sound pro- 
duced in the lower airways or the lungs, known as 
subglottal resonances, whose frequency is lower for 
taller people. Despite the masking of these resonances 
by other voice sounds, researchers wondered whether 
the information they contained could still be heard by 
listeners and used to identify the taller person. 


(a) Make two stemplots, with and without splitting 
the stems. Which plot do you prefer and why? 

(b) Describe the shape, center, and variability of the 
distribution. Are there any outliers? 

(c) If the experimental subjects are just guessing 
which speaker is taller, they should correctly identify 
the taller person about 50% of the time. Does this 
data support the researchers, conjecture that there 
is information in a person’s voice to help identify the 
taller person? Why or why not? 

Watch those scales! Figures 1.18(a) and 1.18(b) both 


show time plots of tuition charged to in-state students 
from 1980 through 2013.7! 


oO T T T T T T T 
1980 1985 1990 1995 2000 2005 2010 2015 
Year 


(a) 


FIGURE 1.18 
Time plots of in-state tuition between 1980 and 2013, for Exercise 1.43. 


(a) Which graph appears to show the biggest increase 
in tuition between 2000 and 2013? 

(b) Read the graphs and compute the actual increase 
in tuition between 2000 and 2013 in each graph. Do 
you think these graphs are for the same or different 
data sets? Why? 


The impression that a time plot gives depends on the 
scales you use on the two axes. Changing the scales can 
make tuition appear to increase very rapidly or to have 
only a gentle increase. The moral of this exercise is: 
Always pay close attention to the scales when you look 
at a time plot. 


1.44 Housing starts. Figure 1.19 (see page 45) is a time 


plot of the number of single-family homes started 
by builders each month from January 1990 through 
December 2013.” The counts are in thousands of 
homes. £jj,] HousING 


(a) The most notable pattern in this time plot is 
yearly up-and-down cycles. At what season of the 
year are housing starts highest? Lowest? The cycles 
are explained by the weather in the northern part 
of the country. 

(b) Is there a longer-term trend visible in addition to 
the cycles? If so, describe it. 

(c) The big economic news of 2007 was a severe 
downturn in housing that began in mid-2006. This 
was followed by the financial crisis in 2008. How are 
these economic events reflected in the time plot? 

(d) How would you describe the behavior of the time 
plot since January 2011? 


1.45 Ozone hole. The ozone hole is a region in the 


In-state tuition 


stratosphere over the Antarctic with exceptionally 
depleted ozone. The size of the hole is not constant 
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FIGURE 1.19 @ 
Time plot of the monthly count of new single-family houses started 
(in thousands) between January 1990 and December 2013, for 
Exercise 1.44. 
over the year but is largest at the beginning of the 


Southern Hemisphere spring (August—October). 


The increase in the size of the ozone hole led 
Montreal Protocol in 1987, an international 


to the 
trea- 


ty designed to protect the ozone layer by phasing 
out the production of substances, such as chloro- 
fluorocarbons (CFCs), believed to be responsible for 


ozone depletion. The following table gives the 


aver- 


age ozone hole size for the period September 7 to 
October 13 for each of the years from 1979 through 
2013 (note that no data were acquired in 1995). 


To get a better feel for the magnitude of the 


nhum- 


bers, the area of North America is approximately 


24.5 million square kilometers (km’). 


OZONE 


The two parts of this exercise will have you draw two 


graphs of these data. 


(a) First make a time plot of the data. The sever- 


ity of the ozone hole will vary from year to year 
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depending on the meteorology of the atmosphere 
above Antarctica. Does the time plot illustrate only 
year-to-year variation or are other patterns appar- 
ent? Specifically, is there a trend over any period of 
years? What about cyclical fluctuation? Explain in 
words the change in the average size of the ozone 
hole over this 35-year period. 

(b) Now make a stemplot of the data. What is the 
midpoint of the distribution of ozone hole size? Do 
you think that the stemplot and the midpoint are a 
good description of this data set? Is there important 
information in the time plot that is not contained in 
the stemplot? When data are collected over time, you 
should always make a time plot. 


Choosing class intervals. Student engineers learn 
that, although handbooks give the strength of a mate- 
rial as a single number, in fact the strength varies from 
piece to piece. A vital lesson in all fields of study is that 
“variation is everywhere.” Here are data from a typical 
student laboratory exercise: the load in pounds needed 
to pull apart pieces of Douglas fir 4 inches long and 1.5 
inches square. 


33,190 31,860 B20 26,520 33,280 
32,320 33,020 32,030 30,460 32,700 
23,040 30,930 32,720 33,650 32,340 
24,050 30,170 31,300 28,730 31,920 


The data sets in the One-Variable Statistical 
Calculator applet on the text website include the 
“pulling wood apart” data given in this exercise. 
How many class intervals does the applet choose 
when drawing the histogram? Use the applet to 
make several histograms with a larger number of 
class intervals. Are there any important features of the 
data that are revealed using a larger number of class 
intervals? Which histogram do you prefer? Explain 
your choice. 


AREA AREA AREA 

YEAR (MILLIONS OF km?) YEAR (MILLIONS OF km?) YEAR (MILLIONS OF km?) 
1979 0.1 1991 18.8 2003 25.8 
1980 Lr 1992 Dies: 2004 15 
1981 0.6 1993 2a 2005 Bete 
1982 4.8 1994 23.6 2006 26.6 
1983 US) 1995 = 2007 22.0 
1984 10.1 1996 ZU 2008 252) 
1985 Mee 1997 22 2009 22.0 
1986 11.3 1998 PENS) 2010 19.4 
1987 I) 1999 23.2 2011 24.7 
1988 10.0 2000 24.8 2012 17.8 
1989 18.7 2001 25.0 2013 21.0 
1990 19.2 2002 12.0 
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CHAPTER 1 @ Picturing Distributions with Graphs 


Exploring the Web 


Natural gas prices. The Department of Energy website contains information 
about monthly wholesale and retail prices for natural gas in each state. Go to 
www.eia.doe.gov/naturalgas/data.cfm and then click on the link 
Monthly Wholesale and Retail Prices. Under Area, choose a state of interest 
to you, make sure the Period is monthly, and then under Residential Price click 
on View History. A window will open with a time plot covering approximately a 
25-year period, along with a table of the monthly residential prices for each year. 


(a) If you have access to statistical software, you should use the Download Data 
(XLS file) link to save the data as an Excel (.xls) file on your computer. Then enter 
the data into your software package, and reproduce the time series plot using the 
graphic capabilities of your software package. Be sure you use an appropriate title 
and axis labels. If you do not have access to appropriate software, provide a rough 
sketch of the time plot that is given on the website. 

(b) Is there a regular pattern of seasonal variation that repeats each year? De- 
scribe it. Are the prices increasing over time? 


Hank Aaron’s home run record. The all-time home run leader prior to 2007 
was Hank Aaron. You can find his career statistics by going to the website 
www.baseball-reference.com and then clicking on the Players tab at 
the top of the page and going to Hank Aaron. 


(a) Make a stemplot or a histogram of the number of home runs that Hank 
Aaron hit in each year during his career. Is the distribution roughly symmetric, 
clearly skewed, or neither? About how many home runs did Aaron hit in a typi- 
cal year? Are there any outliers? 

(b) Would a time plot be appropriate for these data? If so, what information 
would be included in the time plot that is not in the stemplot? 

(c) How has the average number of home runs per game changed over time? 
You can find information about average home runs per game on the website 
www.baseball-reference.com/leagues/MLB/bat.shtml. Give 
an appropriate plot and describe any patterns that you see. Do you see any 
difficulties comparing the home run records of players whose careers spanned 
different eras? Explain. 


Logan Mock-Bunting/Getty Images 


Describing Distributions 
with Numbers 


e saw in Chapter 1 (page 14) that the American Community Survey asks, 

among much else, workers’ travel times to work. Here are the travel times 

in minutes for 15 workers in North Carolina, chosen at random by the U.S. 
Census Bureau:! 


BO 20) 10” 40. 23 20 10 60) Th AO 3 30) 2 1 10 


We aren't surprised that most people estimate their travel time in multiples of 
5 minutes. Here is a stemplot of these data: 


0|5 

I | ROTOTOO 255 
A | OO 

3) 00 
4/00 

5 

6|0O 


The distribution is single peaked and right-skewed. The longest travel time 
(60 minutes) may be an outlier. Our goal in this chapter is to describe with numbers 
the center and variability of this and other distributions. 


CHAPTER 


>, 


In this chapter 
we cover... 


2.1 
2.2 
2.3 


2.4 


2.5 


2.6 


2.7 


2.8 


2.9 
2.10 


Measuring center: the mean 
Measuring center: the median 


Comparing the mean and the 
median 


Measuring variability: 
the quartiles 


The five-number summary 
and boxplots 


Spotting suspected outliers 
and modified boxplots* 


Measuring variability: the 
standard deviation 


Choosing measures of center 
and variability 


Using technology 


Organizing a statistical 
problem 
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FA] DON’T HIDE THE 
win OUTLIERS 


Data from an airliner’s control surfaces, 
such as the vertical tail rudder, go to 
cockpit instruments and then to the 
“black box” flight data recorder. To 
avoid confusing the pilots, short erratic 
movements in the data are “smoothed” 
so that the instruments show overall 
patterns. When a crash killed 260 
people, investigators suspected a 
catastrophic movement of the tail 
rudder. But the black box contained 
only the smoothed data. Sometimes 
outliers are more important than the 
overall pattern. 


NCTRAVEL 


resistant measure 


The most common measure of center is the ordinary arithmetic average, or mean. 


To find the mean of a set of observations, add their values and divide by the number 


of observations. If the n observations are x), X2,... , Xn, their mean is 


= 5G) TP BG IP 88° SP oe, 
x= 


n 


or, in more compact notation, 


x= Sx 


The > (capital Greek sigma) in the formula for the mean is short for “add them 
all up.” The subscripts on the observations x; are just a way of keeping the n observa- 
tions distinct. They do not necessarily indicate order or any other special facts about 
the data. The bar over the x indicates the mean of all the x-values. Pronounce the 
mean X as “x-bar.” This notation is very common. When writers who are discussing 
data use x or y, they are talking about a mean. 


The mean travel time of our 15 North Carolina workers is 


X, Xp te + X, 


x= 
n 
_ 80% 20 Ae TO 
15 
337 
= — = 22,5 minutes 
15 


In practice, you can enter the data into your calculator and ask for the mean. You don’t 
have to actually add and divide. But you should know that this is what the calculator 
is doing. 

Notice that only 6 of the 15 travel times are larger than the mean. If we leave out 
the longest single travel time, 60 minutes, the mean for the remaining 14 people is 
19.8 minutes. That one observation raises the mean by 2.7 minutes. m 


Example 2.1 illustrates an important fact about the mean as a measure of center: 
it is sensitive to the influence of a few extreme observations. These may be outliers, 
but a skewed distribution that has no outliers will also pull the mean toward its long 
tail. Because the mean cannot resist the influence of extreme observations, we say 
that it is not a resistant measure of center. 


Apply Your Knowledge 


2.1 E. coliin Swimming Areas. To investigate water quality, the Columbus Dis- 
patch took water specimens at 16 Ohio State Park swimming areas in central 
Ohio. Those specimens were taken to laboratories and tested for E. coli, which 
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are bacteria that can cause serious gastrointestinal problems. For reference, 
if a 100-milliliter specimen (about 3.3 ounces) of water contains more than 
130 E. coli bacteria, it is considered unsafe. Here are the E. coli levels per 100 
milliliters found by the laboratories:” 


291.0 10.9 47.0 86.0 44.0 189 1.0 50.0 
190.4 45.7 285 18.9 16.0 34.0 86 9.6 


Find the mean E. coli level. How many of the lakes have E. coli levels greater 
than the mean? What feature of the data explains the fact that the mean is 
greater than most of the observations? ECOLI 


2.2 Health Care Spending. Table 1.3 (page 32) gives the 2011 health care ex- 
penditure per capita in 35 countries with the highest gross domestic product in 
2011. The United States, at 8608 international dollars per person, is a high out- 
lier. Find the mean health care spending in these nations with and without the 
United States. How much does the one outlier increase the mean? ely HEALTH 


In Chapter 1, we used the midpoint of a distribution as an informal measure of 
center and gave a method for its computation. The median is the formal version of 
the midpoint, and we now provide a more detailed rule for its calculation. 


The Median M 


The median M is the midpoint of a distribution, the number such that half the obser- 
vations are smaller and the other half are larger. To find the median of a distribution: 


1. Arrange all observations in order of size, from smallest to largest. 

2. If the number of observations n is odd, the median M is the center observation 
in the ordered list. If the number of observations n is even, the median M is 
midway between the two center observations in the ordered list. 


3. You can always locate the median in the ordered list of observations by counting 
up (n + 1)/2 observations from the start of the list. 


Note that the formula (n + 1)/2 does not give the median, just the location 
of the median in the ordered list. Medians require little arithmetic, so they are 
easy to find by hand for small sets of data. Arranging even a moderate num- 

ber of observations in order is very tedious, however, so finding the median by hand 
for larger sets of data is unpleasant. Even simple calculators have an x button, but you 
will need to use software or a graphing calculator to automate finding the median. 


What is the median travel time for our 15 North Carolina workers? Here are the data 
arranged in order: 


5 10 10 10 10 12 15 20 20 25 30 30 40 40 60 evel 


The count of observations, n = 15, is odd. The bold 20 is the center observation in 
the ordered list, with seven observations to its left and seven to its right. This is the 
median, M = 20 minutes. 
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Because n = 15, our rule for the location of the median gives 


nt+1 16 _ 
2 2 


location of M 8 


That is, the median is the eighth observation in the ordered list. It is faster to use this 
rule than to locate the center by eye. m 


Carolina. Here are the travel times in minutes of 20 randomly chosen New York 


ey Travel times to work in New York State are (on the average) longer than in North 
workers: 


NYTRAVEL 
10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45 


A stemplot not only displays the distribution but also makes finding the median easy 
because it arranges the observations in order: 


5 
005555 
0005 
(oxo) 

O05 


O05 


Mitchell Funk/Getty Images 


P®Noahaun—-O 


5 


The distribution is single peaked and right-skewed, with several travel times of an hour 
or more. There is no center observation, but there is a center pair. These are the bold 
20 and 25 in the stemplot, which have nine observations before them in the ordered 
list and nine after them. The median is midway between these two observations: 


20+ 25 
2 


= 22.5 minutes 


With n = 20, the rule for locating the median in the list gives 


n+1_ 24 
2 2 


location of M = 10.5 


The location 10.5 means “halfway between the 10th and 11th observations in the 
ordered list.” That agrees with what we found by eye. m 


Examples 2.1 and 2.2 illustrate an important difference between the mean and the 
median. The median travel time (the midpoint of the distribution) is 20 minutes. The 
mean travel time is higher, 22.5 minutes. The mean is pulled toward the right tail of 
this right-skewed distribution. The median, unlike the mean, is resistant. If the longest 
travel time were 600 minutes rather than 60 minutes, the mean would increase to more 


& than 58 minutes but the median would not change at all. The outlier just counts as one 


observation above the center, no matter how far above the center it lies. The mean uses 
the actual value of each observation and so will chase a single large observation upward. 
The Mean and Median applet is an excellent way to compare the resistance of M and x. 
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Comparing the Mean and the Median 


The mean and median of a roughly symmetric distribution are close together. If the 
distribution is exactly symmetric, the mean and median are exactly the same. In a 
skewed distribution, the mean is usually farther out in the long tail than the median.’ 


Many economic variables have distributions that are skewed to the right. 
For example, the median endowment of colleges and universities in the United 
States and Canada in 2013 was about $101 million—but the mean endow- 
ment was almost $537 million. Most institutions have modest endowments, but 
a few are very wealthy. Harvard’s endowment was over $32 billion. The few 
wealthy institutions pull the mean up but do not affect the median. Reports 
about incomes and other strongly skewed distributions usually give the median 
(“midpoint”) rather than the mean (“arithmetic average”). However, a county 
that is about to impose a tax of 1% on the incomes of its residents cares about 
the mean income, not the median. The tax revenue will be 1% of total income, 
and the total is the mean times the number of residents. The mean and median 

measure center in different ways, and both are useful. Don’t confuse the 
A “average” value of a variable (the mean) with its “typical” value, which we 
might describe by the median. 


Apply Your Knowledge 


NYTRAVEL 


2.3 New York Travel Times. Find the mean of the travel times to work for the 
20 New York workers in Example 2.3. Compare the mean and median for 
these data. What general fact does your comparison illustrate? 


2.4 New House Prices. The mean and median sales prices of 
new homes sold in the United States in November 2013 were 
$270,900 and $340,300.° Which of these numbers is the mean 
and which is the median? Explain how you know. 


2.5 Carbon Dioxide Emissions. Table 1.6 (page 42) gives the 
2010 carbon dioxide (CO,) emissions per person for countries 
with populations of at least 30 million. Find the mean and the 
median for these data. Make a histogram of the data. What fea- 
tures of the distribution explain why the mean is larger than the 
median? jj) cozemiss 


The mean and median provide two different measures of the center of a distri- 
bution. But a measure of center alone can be misleading. The U.S. Census Bureau 
reports that in 2012 the median income of American households was $51,017. Half 
of all households had incomes below $51,017, and half had higher incomes. The 
mean was much higher, $71,274, because the distribution of incomes is skewed to 
the right. But the median and mean don’t tell the whole story. The bottom 20% 
of households had incomes less than $20,593, and households in the top 5% took 

in more than $191,156.° We are interested in the variability of incomes as 
A well as their center. The simplest useful numerical description of a distribution 
requires both a measure of center and a measure of variability. 


Albert Valles Photography/Getty Images 
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One way to measure variability is to give the smallest and largest observations. 
For example, the travel times of our 15 North Carolina workers range from 5 min- 
utes to 60 minutes. These single observations show the full variability of the data, 
but they may be outliers. We can improve our description of variability by also 
looking at the variability of the middle half of the data. The quartiles mark out the 
middle half. Count up the ordered list of observations, starting from the smallest. 
The first quartile lies one-quarter of the way up the list. The third quartile lies three- 
quarters of the way up the list. In other words, the first quartile is larger than 25% 
of the observations, and the third quartile is larger than 75% of the observations. 
The second quartile is the median, which is larger than 50% of the observations. 
That is the idea of quartiles. We need a rule to make the idea exact. The rule for 
calculating the quartiles uses the rule for the median. 


The Quartiles Q, and Q,; 


To calculate the quartiles: 

1. Arrange the observations in increasing order and locate the median, M, in the 
ordered list of observations. 

2. The first quartile, Q,, is the median of the observations whose position in the 
ordered list is to the left of the location of the overall median. 

3. The third quartile, Q,, is the median of the observations whose position in the 
ordered list is to the right of the location of the overall median. 


Here are examples that show how the rules for the quartiles work for both odd 
and even numbers of observations. 


a Our North Carolina sample of 15 workers’ travel times, arranged in increasing order, is 


5 10 10 10 10 12 15 20 20 25 30 30 40 40 60 
NCTRAVEL 
There is an odd number of observations, so the median is the middle one, the bold 


20 in the list. The first quartile is the median of the seven observations to the left of 
the median. This is the fourth of these seven observations, so Q; = 10 minutes. If 
you want, you can use the rule for the location of the median with n = 7: 


n+ 1 7+ 1 
2 2 


location of Q; = 


The third quartile is the median of the seven observations to the right of the median, 
Q; = 30 minutes. When there is an odd number of observations, leave out the overall 
median when you locate the quartiles in the ordered list. 

The quartiles are resistant because they are not affected by a few extreme obser- 
vations. For example, Q3 would still be 30 if the outlier were 600 rather than 60. m 


Here are the travel times to work of the 20 New York workers from Example 2.3, 
arranged in increasing order: 


NYTRAVEL 
5 10 10 15 15 15 15 20 20 20 | 25 30 30 40 40 45 60 60 65 85 


2.5 The Five-Number Summary and Boxplots 53 


There is an even number of observations, so the median lies midway between the 
middle pair, the 10th and 11th in the list. Its value is M = 22.5 minutes. We have 
marked the location of the median by |. The first quartile is the median of the first 
10 observations because these are the observations to the left of the location of 
the median. Check that Q; = 15 minutes and Q; = 42.5 minutes. When the number 
of observations is even, include all the observations when you locate the quartiles. m 


Be careful when, as in these examples, several observations take the same 
numerical value. Write down all the observations, arrange them in order, and apply 
the rules just as if they all had distinct values. 


The smallest and largest observations tell us little about the distribution as a 
whole, but they give information about the tails of the distribution that is missing if 
we know only the median and the quartiles. To get a quick summary of both center 
and variability, combine all five numbers. 


The Five-Number Summary 


The five-number summary of a distribution consists of the smallest observation, 
the first quartile, the median, the third quartile, and the largest observation, written 
in order from smallest to largest. In symbols, the five-number summary is 


Minimum Q, M Q; Maximum 


These five numbers offer a reasonably complete description of center and vari- 
ability. The fivenumber summaries of travel times to work from Examples 2.4 and 
2.5 are 


North Carolina | 5 10 20 30 #60 
New York 5 15 22.5 42.5 85 


The five-number summary of a distribution leads to a new graph, the boxplot. Figure 2.1 
shows boxplots comparing travel times to work in North Carolina and New York. 


Boxplot 


A boxplot is a graph of the five-number summary. 
= Acentral box spans the quartiles Q; and Q3. 
m Aline in the box marks the median M. 


m Lines extend from the box out to the smallest and largest observations. 


Because boxplots show less detail than histograms or stemplots, they are best used 
for side-by-side comparison of more than one distribution, as in Figure 2.1. Be sure 
to include a numerical scale in the graph. When you look at a boxplot, first locate 
the median, which marks the center of the distribution. Then look at the variability. 
The span of the central box shows the variability of the middle half of the data, and 
the extremes (the smallest and largest observations) show the variability of the entire 
data set. We see from Figure 2.1 that travel times to work are in general a bit longer in 
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FIGURE 2.1 
Boxplots comparing the travel times 56 


to work of samples of workers in ihaaed — 95 
North Carolina and New York. 


Third quartile = 42.5 


Median = 22.5 


First quartile = 15 
0) _ 


North Carolina New York 


Travel time to work (minutes) 


New York than in North Carolina. The median, both quartiles, and the maximum 
are all larger in New York. New York travel times are also more variable, as shown 
by the span of the box and the difference between the extremes. Note that the boxes 
with arrows in Figure 2.1 that indicate the location of the fivenumber summary are 
not part of the boxplot, but are included purely for illustration. 

Finally, the New York data are more strongly right-skewed. In a symmetric dis- 
tribution, the first and third quartiles are equally distant from the median. In most 
distributions that are skewed to the right, on the other hand, the third quartile will 
be farther above the median than the first quartile is below it. The extremes behave 
the same way, but remember that they are just single observations and may say little 
about the distribution as a whole. 


() LaunchPad Online Resources 


* The StatClips Examples video, Exploratory Pictures for Quantitative Data 
Example C, provides a worked out example of computing the five-number sum- 
mary for both odd and even sample sizes, and then draws a comparative boxplot. 


Apply Your Knowledge 


2.6 The Seattle Seahawks. The 2013-14 roster of the Seattle Seahawks, win- 
ners of the 2014 NFL Super Bowl, included 10 defensive linemen and 9 offensive 
linemen. The weights in pounds of the defensive linemen were £jjy,| SEAHAWKS 


274 


254 311 297 323 260 242 300 252 


and the weights of the offensive linemen were 


Chip Somodevilla/Getty Images 


310 315 305 318 298 301 321 332 320 


2.6 Spotting Suspected Outliers and Modified Boxplots 


(a) Make a stemplot of the weights of the defensive linemen and find the 
five-number summary. 

(b) Make a stemplot of the weights of the offensive linemen and find the 
five-number summary. 

(c) Does either group contain one or more clear outliers? Which group of 
players tends to be heavier? 


2.7 Fuel Economy for Midsize Cars. The Department of Energy provides 
fuel economy ratings for all cars and light trucks sold in the United States. 
Here are the estimated miles per gallon for city driving for the 205 cars clas- 
sified as midsize in 2014, arranged in increasing order:’ £j),| mipcars 


Ll Wh HS ie 1S) SS SS I) S 
15 15 16 16 16 16 16 16 16 16 16 16 16 16 16 16 17 17 
LL NM IS SSIS SSS 1S MIS M1318 
Is) lies sy Is} he a) US) IS URS TK) I) a) I) ea) 
OY HC) 70) 720) 10) oy 70) 20) FQ) HO) 20) 720) FO) HO) 20) OD). Il 
Bi A Aik Al Bil Biv Bil al Bil A Ail Bil Bl wih Bae wy ay 
Dh Tia fh Pah, Penh Jil, TE Pp, T3y 08) 7B8) Piss 13) Tay Te Di he 2 
Up ty Te Te DE ED. Te DR TG IB) BS) I) sy Mis By DP. 5, 
Ws) dy Psy Ws) GIS. Js NS) US ss A) Ws oy, PA) ST BL 
WT PL PEE PAL TEL hE ETE hss) Ps Meh Pho} is) hs} Pha} hs} 
29 29 29 29 29 30 30 30 30 30 30 30 31 35 36 36 36 40 
AO 40°43" -455 947 5051 


(a) Give the five-number summary of this distribution. 


(b) Draw a boxplot of these data. What is the shape of the distribution 
shown by the boxplot? Which features of the boxplot led you to this 
conclusion? Are any observations unusually small or large? 


Look again at the stemplot of travel times to work in New York in Example 2.3. 
The five-number summary for this distribution is 


» 13: 215° ALS. 85 


How shall we describe the variability of this distribution? The smallest and largest 
observations are extremes that don’t describe the variability of the majority of the 
data. The distance between the quartiles (the range of the center half of the data) is 
a more resistant measure of variability. This distance is called the interquartile range. 


The Interquartile Range (/QR) 
The interquartile range (/QR) is the distance between the first and third quartiles, 


HOI aC) ale), 


For our data on New York travel times, IQR = 42.5 — 15 = 27.5 min- 
utes. However, no single numerical measure of variability, such as IQR, is very 
useful for describing skewed distributions. The two sides of a skewed distribution 


*This short section is optional. 
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= ly 

NYTRAVEL 
FW pow mucu ts 
Wik THAT HOUSE 


WORTH? 
The town of Manhattan, Kansas, is 
sometimes called “the Little Apple” 
to distinguish it from that other 
Manhattan, “the Big Apple.” A few 
years ago, a house there appeared 
in the county appraiser’s records 
valued at $200,059,000. That would 
be quite a house even on Manhattan 
Island. As you might guess, the 
entry was wrong: the true value was 
$59,500. But before the error was 
discovered, the county, the city, and 
the school board had based their 
budgets on the total appraised value 
of real estate, which the one outlier 
jacked up by 6.5%. It can pay to spot 
outliers before you trust your data. 


FIGURE 2.2 

Horizontal modified boxplots comparing 
the travel times to work of samples of 
workers in North Carolina and New York. 


have different variability, so one number can’t summarize them. That’s why we give 
the full fivenumber summary. The interquartile range is mainly used as the basis 
for a rule of thumb for identifying suspected outliers. 


The 1.5 X /QR Rule For Outliers 


Call an observation a suspected outlier if it falls more than 1.5 X IQR above the 
third quartile or below the first quartile. 


For the New York travel time data, /QR = 27.5 and 
1.5 K IQR = 1.5 KX 27.5 = 41.25 
Any values not falling between 


Q, — (1.5 X IQR) = 15.0 — 41.25 = —26.25 and 
Q3 + (1.5 X IQR) = 42.5 + 41.25 = 83.75 


are flagged as suspected outliers. Look again at the stemplot in Example 2.3: The 
only suspected outlier is the longest travel time, 85 minutes. The 1.5 X /QR rule 
suggests that the three next-longest travel times (60 and 65 minutes) are just part 
of the long right tail of this skewed distribution. m 


In a modified boxplot, which is provided by many software packages, the sus- 
pected outliers are identified in the boxplot with a special plotting symbol such 
as an asterisk (*). Comparing Figure 2.2 with Figure 2.1, we see that the largest 
observation from New York is flagged as an outlier. The line beginning at the third 
quartile no longer extends to the maximum but now ends at 65, which is the largest 
observation from New York that is not identified as an outlier. Figure 2.2 also dis- 
plays the modified boxplots horizontally rather than vertically, an option available 
in some software packages which does not change the interpretation of the plot. 
Finally, the 1.5 X IQR rule is not a replacement for looking at the data. It is most 
useful when large volumes of data are processed automatically. 


a sale thh— 
North Carolina + | — 


) 20 40 60 80 100 
Minutes 
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Apply Your Knowledge 


2.8 Travel Time to Work. In Example 2.1, we noted the influence of one 
long travel time of 60 minutes in our sample of 15 North Carolina workers. 
Does the 1.5 X IQR rule identify this travel time as a suspected outlier? 


NCTRAVEL 


2.9 Fuel Economy for Midsize Cars. Exercise 2.7 gives the estimated miles 
per gallon (mpg) for city driving for the 205 cars classified as midsize in 2014. 
In that exercise we noted that several of the mpg values were unusually large. 
Which of these are suspected outliers by the 1.5 X IQR rule? Although outli- 
ers can be produced by errors or incorrectly recorded observations, they are 
often observations that differ from the others in some particular way. In this 
case, the cars producing the high outliers share a common feature. What do 
you think that is? £y).) mipcars 


The five-number summary is not the most common numerical description of a 
distribution. That distinction belongs to the combination of the mean to measure 
center and the standard deviation to measure variability. The standard deviation and 
its close relative, the variance, measure variability by looking at how far the observa- 
tions are from their mean. 


The Standard Deviation s 


The variance s? of a set of observations is an average of the squares of the devia- 
tions of the observations from their mean. In symbols, the variance of n observations 
X11 X72) +s 0) Xq is 


or, more compactly, 
1 
— me. 
v= % = 
n—-1 pat E ) 
The standard deviation s is the square root of the variance s?: 


s=4 +S (y — 3 


Tall 


In practice, use software or your calculator to obtain the standard deviation from 
keyed-in data. Doing an example step-by-step will help you understand how the 
variance and standard deviation work, however. 


Georgia Southern University had 3219 students with regular admission in its fresh- 

man class of 2013. For each student, data are available on their SAT and ACT all 1 
scores (if taken), high school GPA, and the college within the university to which 
they were admitted.® In Exercise 3.49 (page 98), the full data set for the SAT 
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Mathematics scores will be examined. Here are the first five observations from that 
data set: 


490 580 450 570 650 
We will compute x and s for these students. First find the mean: 


¥ 490 + 580 + 450 + 570 + 650 
5 


2740 
= —— = 548 
5 


Figure 2.3 displays the data as points above the number line, with their mean marked 
by an asterisk (*). The arrows mark two of the deviations from the mean. The devia- 
tions show how variable the data are about their mean. They are the starting point 
for calculating the variance and the standard deviation. 


Observations Deviations Squared deviations 
X; ii (p= ae 

490 490 — 548 = —58 (—58) = 3,364 

580 580 — 548 = 32 32? = 1,024 

450 450 — 548 = —98 (—98)? = 9,604 

570 570 — 548 = 22 22? = 484 

650 650 — 548 = 102 102? = 10,404 

sum = 0 sum = 24,880 


The variance is the sum of the squared deviations divided by one less than the num- 
ber of observations: 


1 


24,880 
s= xX, — X)? = ——— = 6220 
n=] > ( ) 4 
The standard deviation is the square root of the variance: 
Ss = V6220 = 78.87 6 
FIGURE 2.3 = 
SAT Critical Reading scores for five ae oa 0 
students, with their mean (*) and the Aint = deviation OD! 
deviations of two observations from 
the mean shown, for Example 2.7. 
e e * eo. e 
S E 22 oe a . 
2 + $8 wn LA in Xe) R 


SAT Critical Reading Score 


Notice that the “average” in the variance s’ divides the sum by one fewer than 
the number of observations, that is, n — 1 rather than n. The reason is that the 
deviations x; — x always sum to exactly 0, so that knowing n — 1 of them deter- 
mines the last one. Only n — 1 of the squared deviations can vary freely, and we 

degrees of freedom _ average by dividing the total by n — 1. The number n — | is called the degrees 
of freedom of the variance or standard deviation. Some calculators offer a choice 
between dividing by n and dividing by n — 1, so be sure to use n — 1. 


2.8 Choosing Measures of Center and Variability 59 


More important than the details of hand calculation are the properties that 
determine the usefulness of the standard deviation: 


= s measures variability about the mean and should be used only when the mean 
is chosen as the measure of center. 


= sis always zero or greater than zero. s = 0 only when there is no variability. This 
happens only when all observations have the same value. Otherwise, s > 0. As 
the observations become more variable about their mean, s gets larger. 


e s has the same units of measurement as the original observations. For example, if 
you measure weight in kilograms, both the mean x and the standard deviation 
s are also in kilograms. This is one reason to prefer s to the variance s’, which 
would be in squared kilograms. 


» Like the mean x, s is not resistant. A few outliers can make s very large. 


few extreme observations. For example, the standard deviation of the travel 

times for the 15 North Carolina workers in Example 2.1 is 15.23 minutes. 
(Use your calculator or software to verify this.) If we omit the high outlier, the 
standard deviation drops to 11.56 minutes. 

If you feel that the importance of the standard deviation is not yet clear, you are 
right. We will see in Chapter 3 that the standard deviation is the natural measure 
of variability for a very important class of symmetric distributions, the Normal dis- 
tributions. The usefulness of many statistical procedures is tied to distributions of 
particular shapes. This is certainly true of the standard deviation. 


A The use of squared deviations renders s even more sensitive than x to a 


We now have a choice between two descriptions of the center and variability of a 
distribution: the fivenumber summary, or X and s. Because x and s are sensitive to 
extreme observations, they can be misleading when a distribution is strongly skewed 
or has outliers. In fact, because the two sides of a skewed distribution have different 
variability, no single number describes the variability well. The five-number sum- 
mary, with its two quartiles and two extremes, does a better job. 


Choosing a Summary 


The five-number summary is usually better than the mean and standard deviation 
for describing a skewed distribution or a distribution with strong outliers. Use x and s 
only for reasonably symmetric distributions that are free of outliers. 


Outliers can greatly affect the values of the mean x and the standard 
deviation s, the most common measures of center and variability. Many 
more elaborate statistical procedures also can’t be trusted when outliers are 

present. Whenever you find outliers in your data, try to find an explanation for them. 
Sometimes the explanation is as simple as a typing error, such as typing 10.1 as 101; 
if this is the case, correct the typing error. Sometimes a measuring device broke 
down or a subject gave a frivolous response, like the student in a class survey 
who claimed to study 30,000 minutes per night. (Yes, that really happened.) In all 
these cases, you can simply remove the outlier from your data. When outliers are 
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“real data,” like the long travel times of some New York workers, you should choose 
statistical methods that are not greatly disturbed by the outliers. For example, 
use the five-number summary rather than x and s to describe a distribution with 
extreme outliers. We will encounter other examples later in the book. 
Remember that a graph gives the best overall picture of a distribution. If 
A data have been entered into a calculator or statistical program, it is very 
simple and quick to create several graphs to see all the different features 
of a distribution. Numerical measures of center and variability report specific 
facts about a distribution, but they do not describe its entire shape. Numerical 
summaries do not disclose the presence of multiple peaks or clusters, for example. 
Exercise 2.11 shows how misleading numerical summaries can be. Always plot 
your data. 


(4) LaunchPad Online Resources 


* The Snapshots video, Summarizing Quantitative Data, provides an overview 
of the need for measures of center and variability as well as some details of 
the computations. 


* The StatClips Examples video, Summaries of Quantitative Data Example 
C, gives the details for the computation of the mean, median, and standard 
deviation in a small example. You can verify the computations along with the 
video, either by hand or using your technology. 


* The StatClips Examples videos, Basic Principles of Exploring Data Example 
B and Basic Principles of Exploring Data Example C, emphasize the need to 
examine outliers and understand them, rather than simply discarding obser- 
vations that don’t seem to fit. 


Apply Your Knowledge 


2.10 X and s by Hand. Radon is a naturally occurring gas and is the second 
leading cause of lung cancer in the United States.’ It comes from the natural 
breakdown of uranium in the soil and enters buildings through cracks and 
other holes in the foundations. Found throughout the United States, levels 
vary considerably from state to state. Several methods can reduce the levels 
of radon in your home, and the Environmental Protection Agency recom- 
mends using one of these if the measured level in your home is above 4 pico- 
curies per liter. Four readings from Franklin County, Ohio, where the county 
average is 8.4 picocuries per liter, were 6.2, 12.8, 7.6, and 15.4. 


(a) Find the mean step-by-step. That is, find the sum of the four observations 
and divide by 4. 


(b) Find the standard deviation step-by-step. That is, find the deviation 
of each observation from the mean, square the deviations, then 
obtain the variance and the standard deviation. Example 2.7 shows 
the method. 


(c) Now enter the data into your calculator and use the mean and standard 
deviation buttons to obtain x and s. Do the results agree with your hand 
calculations? 


2.11 x ands Are Not Enough. The mean x and standard deviation s measure 
center and variability but are not a complete description of a distribution. Data 


sets with different shapes can have the same mean and standard deviation. 
To demonstrate this fact, use your calculator to find x and s for these two 
small data sets. Then make a stemplot of each and comment on the shape of 
each distribution. £gj)) pataset2 


Data A: |9.14 8.14 8.74 8.77 9.26 8.10 6.13 3.10 9.13 7.26 4.74 
Data B: |6.58 5.76 7.71 8.84 8.47 7.04 5.25 5.56 7.91 6.89 12.50 


2.12 Choose aSummary. The shape of a distribution is a rough guide to wheth- 
er the mean and standard deviation are a helpful summary of center and 
variability. For which of the following distributions would x and s be useful? 
In each case, give a reason for your decision. 


(a) Percents of high school graduates in the states taking the SAT; 
Figure 1.8 (page 27) 


(b) Iowa Tests scores, Figure 1.7 (page 27) 
(c) New York travel times, Figure 2.1 (page 54) 


Although a calculator with “two-variable statistics” functions will do the basic 
calculations we need, more elaborate tools are helpful. Graphing calculators and 
computer software will do calculations and make graphs as you command, freeing 
you to concentrate on choosing the right methods and interpreting your results. 
Figure 2.4 displays output describing the travel times to work of 20 people in New 
York State (Example 2.3). Can you find x, s, and the five-number summary in each 
output? The big message of this section is: Once you know what to look for, you can 
read output from any technological tool. 

The displays in Figure 2.4 come from a Texas Instruments graphing calcula- 
tor, the Minitab, Crunchlt!, and JMP statistical programs, and the Microsoft 
Excel spreadsheet program. Minitab and JMP allow you to choose what descrip- 
tive measures you want, whereas the descriptive measures in the Crunchlt! out- 
put are provided by default. Excel and the calculator give some things we don’t 
need. Just ignore the extras. Excel’s “Descriptive Statistics” menu item doesn’t 
give the quartiles. We used the spreadsheet’s separate quartile function to get 


Q; and Q. 


In Example 2.5, we saw that the quartiles of the New York travel times are 
A Q; = 15 and Q; = 42.5. Look at the output displays in Figure 2.4. The 

calculator, Crunchlit!, and Excel agree with our work. Minitab and JMP say 
that Q; = 43.75. What happened? There are several rules for finding the quartiles. 
Some calculators and software use rules that give results different from ours for some 
sets of data. This is true of Minitab, JMP, and also Excel, though Excel agrees with 
our work in this example. Results from the various rules are always close to each 
other, so the differences are never important in practice. Our rule is the simplest for 
hand calculation. = 


2.9 Using Technology 
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Texas Instruments Graphing Calculator 


i-Var Stats “Var Stats 
R=31,25 a 


2x=625 
2x*=28625 


$x=21,8773495 
raat #21, 32340264 
n=? 


Minitab 


Descriptive Statistics: NYtime 


Total 


variable Count Mean StDev Variance Minimum Ql Median Q3 Maximum 
NYtime 20° 31..25 21.88 478.62 5.00) 15.00) 22.50 43.75 85.00 


Crunchlt! 


Results - Descriptive Statistics a||x 


Export* 


| | | Sample Mean| Standard Deviation /Min{Q1|Median}a3_ | Max‘ 
sas ets 2250 [42.50/85 | 


Microsoft Excel 


minutes Pp —(‘“‘si‘iSC“ 
Se ae | 

3r.25] rt tt~—“‘SC#*dSCONC‘édC CWS 

4.891924064 es es 8 

| QUARTILE(A2:A21,1)| 15 | | 

5 | QUARTILE (A2:A21,3)| 42-5 | | 

‘nf 

47g.6re42i1|—i—“‘“‘(SC*rLSOCC*édC CL 

o.329ee4iz6| —s—“‘iLSCd CS 

z.o4oi1tos36| ——“‘“CSC*rLSOC#*dCL*C 
ee ee 
s[ t—“‘mLSCCisCC*@dS 
ast CO 
PS rt—“t*‘“C;C*s*s*s*S*S:S~SsdrSC“‘(SC*d*d 
—— _) 
—_——————SSSSS_=SE==—=E_—— =__ BS 

al 

JMP Output 
A\) Distributions 
A\=)NYtime 
4) Quantiles 


maximum 85 
quartile 43.75 
median 22.5 


quartile 15 
minimum 5 
FIGURE 2.4 A= Summary Statistics 


Output from a graphing calculator, three 


statistical software packages, and a Dee ae 
spreadsheet program describing the : 20 


data on travel times to work in New 
York State. 
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(4) LaunchPad Online Resources 
There are online resources to help you use technology. 
* The software Crunchlt! is available online, as are Crunchlt! help videos. 


* The Video Technology Manuals for Minitab, the Tl graphing calculator, 
Excel, JMP, and SPSS provide explicit instructions for producing output similar 
to that provided in Figure 2.4 for the particular technology you are using. 


Most of our examples and exercises have aimed to help you learn basic tools 
(graphs and calculations) for describing and comparing distributions. You have also 
learned principles that guide use of these tools, such as “start with a graph” and 
“look for the overall pattern and striking deviations from the pattern.” The data you 
work with are not just numbers—they describe specific settings such as water depth 
in the Everglades or travel time to work. Because data come from a specific setting, 
the final step in examining data is a conclusion for that setting. Water depth in the 
Everglades has a yearly cycle that reflects Florida’s wet and dry seasons. Travel times 
to work are generally longer in New York than in North Carolina. 

Let’s return to the on-time high school graduation rates discussed in Example 
1.4, page 21. We know from the example that the on-time graduation rates vary 
from 59% in the District of Columbia to 88% in Iowa, with a median of 80%. State 
graduation rates are related to many factors, and in a statistical problem we often try 
to explain the differences or variation in a variable such as graduation rate by some 
of these factors. For example, do states with lower household incomes tend to have 
lower high school graduation rates? Or, do the states in some regions of the country 
tend to have lower high school graduation rates than in other regions? 

As you learn more statistical tools and principles, you will face more complex sta- 
tistical problems. Although no framework accommodates all the varied issues that 
arise in applying statistics to real settings, we find the following four-step thought 
process gives useful guidance. In particular, the first and last steps emphasize that 
statistical problems are tied to specific real-world settings and therefore involve 
more than doing calculations and making graphs. 


Organizing a Statistical Problem: 


A Four-Step Process 


STATE: What is the practical question, in the context of the real-world setting? 
PLAN: What specific statistical operations does this problem call for? 

SOLVE: Make the graphs and carry out the calculations needed for this problem. 
CONCLUDE: Give your practical conclusion in the setting of the real-world problem. 


To help you master the basics, many exercises will continue to tell you what to 
do—make a histogram, find the fivenumber summary, and so on. Real statistical 
problems don’t come with detailed instructions. From now on, especially in the 
later chapters of the book, you will meet some exercises that are more realistic. Use 
the four-step process as a guide to solving and reporting these problems. They are 
marked with the four-step icon, as the following example illustrates. 
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DATA 


GRADRATE 


FIGURE 2.5 

Stemplots comparing the distribu- 
tions of graduation rates for the four 
census regions from Table 1.1, for 
Example 2.9. 


(STATE?) Federal law requires all states in the United States to use a common com- 
putation of on-time high school graduation rates beginning with the 2010-11 school 
year. Previously, states chose one of several computation methods that gave answers 
that could differ by more than 10%. This common computation allows for meaningful 
comparison of graduation rates between the states. 

We know from Table 1.1 (page 22) that the on-time high school graduation rates 
varied from 59% in the District of Columbia to 88% in lowa. The U.S. Census Bureau 
divides the 50 states and the District of Columbia into four geographical regions: the 
Northeast (NE), Midwest (MW), South (S), and West (W). The region for each state is 
included in Table 1.1. Do the states in the four regions of the country display distinct 
distributions of graduation rates? How do the mean graduation rates of the states in 
each of these regions compare? 


(PLANS) Use graphs and numerical descriptions to describe and compare the distri- 
butions of on-time high school graduation rates of the states in the four regions of 
the United States. 


SORTER We might use boxplots to compare the distributions, but stemplots pre- 
serve more detail and work well for data sets of these sizes. Figure 2.5 displays the 
stemplots with the stems lined up for easy comparison. The stems have been split 
to better display the distributions. The stemplots overlap, and some care is needed 
when comparing the four stemplots as the sample sizes differ, with some stemplots 
having more leaves than others. None of the plots shows strong skewness, although 
the South has one low observation that stands apart from the others with this choice 
of stems. The states in the Northeast and Midwest have distributions that are similar 
to each other, as do those in the South and West. The graduation rates tend to be 
higher for the states in the Northeast and Midwest and more variable for the states 
in the South and West. With little skewness and no serious outliers, we report X and 
S as Our Summary measures of center and variability of the distribution of the on-time 
graduation rates of the states in each region: 


Region Mean Standard Deviation 

Midwest 82.92 4,25 

Northeast 82.56 3.47 

South 75.93 7.36 

West 73.58 6.73 
Midwest Northeast South West 
&|66678 8) 67 &| 66 8 
8'01334 8) 33334 &8/}123 &8|002 
7|7 7|77 7|5688 7|6668 
7|4 7 71/1124 7|4 
6 6 6|7 6|88 
6 6 6 6| 23 
5 5 5| 9 5 


(FSOMSEUDSH The table of summary statistics confirms what we see in the stem- 
plots. The states in the Midwest and Northeast are quite similar to each other, as are 
those in the South and West. The states in the Midwest and Northeast have a higher 
mean graduation rate as well as a smaller standard deviation than those in the South 
and West. = 
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It is important to remember that the individuals in Example 2.9 are the states. 
For example, the mean of 82.56 is the mean of the on-time graduation rates for 
the nine Northeastern states, and the standard deviation tells us how much these 
state rates vary about this mean. However, the mean of these nine states is not the 
same as the graduation rate for all high school students in the Northeast, unless 
the states have the same number of high school graduates. The graduation rate for 
all high school students in the Northeast would be a weighted average of the state 
rates, with the larger states receiving more weight. For example, since New York is 
the most populous state in the Northeast and also has the lowest graduation rate, 
we would expect the graduation rate of all high school students in the Northeast 
to be lower than 82.56, as New York would pull down the overall graduation rate. 
See Exercise 2.37 for a similar example. 


(3) LaunchPad Online Resources 


The StatBoards video, The 4-Step Process, provides an additional example of 
the four-step process. Many exercises throughout the book are marked with the 
four-step icon and this process will be necessary in solving these problems. If you 
are unclear about any of the steps, you should watch the video to become more 
comfortable with the process. 


Apply Your Knowledge 


2.13 Logging in the Rain Forest. “Conservationists have despaired over 
destruction of tropical rain forest by logging, clearing, and burning.” These 42 
words begin a report on a statistical study of the effects of logging in Bor- 
neo.!° Charles Cannon of Duke University and his co-workers compared 
forest plots that had never been logged (Group 1) with similar plots nearby 
that had been logged one year earlier (Group 2) and eight years earlier 
(Group 3). All plots were 0.1 hectare in area. Here are the counts of trees 
for plots in each group: LOGGING 


Grol 27 22 WW Bl WW 33 io 2O Hb ay ws ie 
Group2: 12 12 15 9 20 18 17 14° 14 2 ii 1 
Group 3: 18 a hy ihes I) le 


To what extent has logging affected the count of trees? Follow the four-step 
process in reporting your work. 


the thousands of foreign diplomats in New York City could freely violate 
parking laws. Two economists looked at the number of unpaid parking tickets 
per diplomat over a five-year period ending when enforcement reduced the 
problem.!! They concluded that large numbers of unpaid tickets indicated a 
“culture of corruption” in a country and lined up well with more elaborate 
measures of corruption. The data set for 145 countries is too large to print 
here, but look at the data file on the text website. The first 32 countries in 
the list (Australia to Trinidad and Tobago) are classified by the World Bank as 
“developed.” The remaining countries (Albania to Zimbabwe) are “develop- 
ing.” The World Bank classification is based only on national income and 
does not take into account measures of social development. SCOFFLAW 


2.14 Diplomatic Scofflaws. Until Congress allowed some enforcement in 2002, 4. 


Give a full description of the distribution of unpaid tickets for both 
groups of countries and identify any high outliers. Compare the two groups. 
Does national income alone do a good job of distinguishing countries whose 
diplomats do and do not obey parking laws? 


© James Leynse/Corbis 
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CHAPTER 2 SUMMARY 


Chapter Specifics 


= A numerical summary of a distribution should report at least its center and its variability. 

m The mean xX and the median M describe the center of a distribution in different ways. 
The mean is the arithmetic average of the observations, and the median is the mid- 
point of the values. 

m When you use the median to indicate the center of the distribution, describe its vari- 
ability by giving the quartiles. The first quartile, Q;, has one-fourth of the observations 
below it, and the third quartile, Q;, has three-fourths of the observations below it. 

mu The five-enumber summary consisting of the median, the quartiles, and the smallest 
and largest individual observations provides a quick overall description of a distribution. 
The median describes the center, and the quartiles and extremes show the variability. 

= Boxplots based on the five-number summary are useful for comparing several distribu- 
tions. The box spans the quartiles and shows the variability of the central half of the 
distribution. The median is marked within the box. Lines extend from the box to the 
extremes and show the full variability of the data. 

m The variance s’ and especially its square root, the standard deviation s, are common 
measures of variability about the mean as center. The standard deviation s is zero when 
there is no variability and gets larger as the variability increases. 

m= A resistant measure of any aspect of a distribution is relatively unaffected by changes 
in the numerical value of a small proportion of the total number of observations, no 
matter how large these changes are. The median and quartiles are resistant, but the 
mean and the standard deviation are not. 

m The mean and standard deviation are good descriptions for symmetric distributions 
without outliers. They are most useful for the Normal distributions introduced in the 
next chapter. The five-number summary is a better description for skewed distributions. 

m= Numerical summaries do not fully describe the shape of a distribution. Always plot your data. 

m A statistical problem has a real-world setting. You can organize many problems using 
the following four steps: State, Plan, Solve, and Conclude. 


Link It 


In this chapter we have continued our study of exploratory data analysis. Graphs are an impor- 
tant visual tool for organizing and identifying patterns in data. They give a fairly complete de- 
scription of a distribution, although for many problems the important information in your data 
can be described by a few numbers. These numerical summaries can be useful for describing a 
single distribution as well as for comparing the distributions from several groups of observations. 
Two important features of a distribution are the center and the variability. For distributions 
that are approximately symmetric without outliers, the mean and standard deviation are impor- 
tant numeric summaries for describing and comparing distributions. But if the distribution is not 
symmetric and/or has outliers, the five-number summary often provides a better description. 
The boxplot gives a picture of the five-number summary that is useful for a simple comparison 
of several distributions. Remember that the boxplot is based only on the five-number summary and 
does not have any information beyond these five numbers. Certain features of a distribution that 
are revealed in histograms and stemplots will not be evident from a boxplot alone. These include 
gaps in the data and the presence of several peaks. You must be careful when reducing a distribu- 
tion to a few numbers to make sure that important information has not been lost in the process. 


(9) LaunchPad Online Resources 
If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


* StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 


CHECK YOUR SKILLS 


2.15 


CHAPTER 2 EXERCISES 


The respiratory system can be a limiting factor in maxi- 
mal exercise performance. Researchers from the United 
Kingdom studied the effect of two breathing frequencies 
on both performance times and several physiological 
parameters in swimming.’” Subjects were 10 male colle- 
giate swimmers. Here are their times in seconds to swim 
200 meters at 90% of race pace when breathing every 
second stroke in front crawl swimming: £j,,) swimtiMe 


151.6 165.1 159.2 163.5 174.8 173.2 177.6 
174.3 164.1 171.4 

The mean of these data is 

(a) 165.10. (b) 167.48. (c) 168.25. 

The median of the data in Exercise 2.15 is 

(a) 167.48. (b) 168.25. (c) 174.00. 


The five-number summary of the data in Exercise 2.15 is 


(a) 151.6, 163.5, 167.48, 174.3, 177.6. 
(b) 151.6, 163.5, 168.25, 174.3, 177.6. 
(c) 151.6, 159.2, 168.25, 174.3, 177.6. 


If a distribution is skewed to the left, 


(a) the mean is less than the median. 
(b) the mean and median are equal. 
(c) the mean is greater than the median. 


2.25 Incomes of college grads. According to the U.S. Cen- 


2.26 


2.27 


sus Bureau’s Current Population Survey, the mean and 
median 2012 income of people at least 25 years old 
who had a bachelor’s degree but no higher degree were 
$50,281 and $62,597. Which of these numbers is the 


mean and which is the median? Explain your reasoning. 


Saving for retirement. Retirement seems a long way 
off, and we need money now, so saving for retirement is 
hard. Once every three years, the Board of Governors of 
the Federal Reserve System collects data on household 
assets and liabilities through the Survey of Consumer 
Finances (SCF). The most recent such survey was con- 
ducted in 2010, and the survey results were released to 
the public in April 2013. The survey presents data on 
retirement assets, which include defined contribution 
and Individual Retirement Account (IRA) balances. 
For married households the mean value per household 
is $123,968, but the median value is just $10,000. For 
single households, the mean is $33,585, and the median 
is $0.4 What explains the differences between the two 
measures of center, both for married and single house- 
holds? What does a median of $0 say about the percent- 
age of single households with retirement assets? 


University endowments. The National Association of 
College and University Business Officers collects data 


Cae) 
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2.21 


2.22 


2.23 


2.24 


2.28 


Exercises 
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What percent of the observations in a distribution are 
greater than the third quartile? 

(a) 25% (b) 50% (c) 75% 

To make a boxplot of a distribution, you must know 
(a) all the individual observations. 

(b) the mean and the standard deviation. 

(c) the five-number summary. 

The standard deviation of the 10 swim times in Exer- 
cise 2.15 (use your calculator) is about 

(a) 7.4. (b) 7.8. (@) th. 

What are all the values that a standard deviation s can 
possibly take? 

(OSs (b)0Ss1 
The correct units for the standard deviation in Exer- 
ciser ware 


(et a 


(b) seconds. 


(a) no units—it’s just a number. 
(c) seconds squared. 

Which of the following is least affected if an extreme 
high outlier is added to your data? 

(a) The median (b) The mean 

(c) The standard deviation 


on college endowments. In 2013, its report included the 
endowment values of 849 colleges and universities in 
the United States and Canada. When the endowment 
values are arranged in order, what are the locations of 
the median and the quartiles in this ordered list? 


Pulling wood apart. Exercise 1.46 (page 45) gives 
the breaking strengths of 20 pieces of Douglas fir. 
[y,) Wood 


ATA 


(a) Give the five-number summary of the distribution 
of breaking strengths. 

(b) Here is a stemplot of the data rounded to the 
nearest hundred pounds. The stems are thousands of 
pounds, and the leaves are hundreds of pounds. 
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The stemplot shows that the distribution is skewed 
to the left. Does the five-number summary show the 
skew? Remember that only a graph gives a clear pic- 
ture of the shape of a distribution. 


Comparing graduation rates. An alternative pre- 
sentation to compare the graduation rates in Table 
1.1 (page 22) by region of the country reports five- 
number summaries and uses boxplots to display the 
distributions. Do the boxplots fail to reveal any impor- 
tant information visible in the stemplots of Figure 2.5? 
Which plots make it simpler to compare the regions? 
Why? £iy)) GRapraTe 


How much fruit do adolescent girls eat? Figure 
1.14 (page 39) is a histogram of the number of serv- 
ings of fruit per day claimed by 74 seventeen-year- 
old girls. 


(a) With a little care, you can find the median and 
the quartiles from the histogram. What are these 
numbers? How did you find them? 

(b) You can also find the mean number of servings 

of fruit claimed per day from the histogram. First use 
the information in the histogram to compute the sum 
of the 74 observations, and then use this to compute 
the mean. What is the relationship between the mean 
and median? Is this what you expected? 

(c) In general, you cannot find the exact values of the 
median, quartiles, or mean from the histogram. What 
is special about the histogram of the number of serv- 
ings of fruit that allows you to do this? 


Guinea pig survival times. Here are the survival times 
in days of 72 guinea pigs after they were injected with 
infectious bacteria in a medical experiment.!° Survival 
times, whether of machines under stress or cancer pa- 
tients after treatment, usually have distributions that 
are skewed to the right. &j)| Guinpias 


45) 53} 5X) XS) AY 


80 80 81 81 81 82 83 83 84 88 89 91 
91 92 92 97 99 99 100 100 101 102 102 102 
103 104 107 108 109 113 114 118 121 123 126 128 
137 138 139 144 145 147 156 162 174 178 179 184 
191 198 211 214 243 249 329 380 403 511 522 598 
(a) Graph the distribution and describe its main 
features. Does it show the expected right-skew? 
(b) Which numerical summary would you choose for 
these data? Calculate your chosen summary. How 
does it reflect the skewness of the distribution? 
2.32 Maternal age at childbirth. How old are women when 


they have their first child? Here is the distribution of 
the age of the mother for all firstborn children in the 
United States in 2012:'° 


AGE COUNT AGE COUNT 
10 to 14 years 3,578 30 to 34 years 299,857 
15 to 19 years 251,022 35 to 39 years 106,892 
20 to 24 years 461,553 40 to 44 years 24,251 
25 to 29 years 421,704 45 to 49 years 1,952 

= 

Es 

é 


2.33 


The number of firstborn children to mothers under 10 
or over 50 years of age represent a negligible percent- 
age of all first births, and are not included in the table. 


(a) For comparison with other years and with other 
countries, we prefer a histogram of the percents in 
each age class rather than the counts. Explain why. 
(b) How many babies were there? 

(c) Make a histogram of the distribution, using 
percents on the vertical scale. Using this histogram, 
describe the distribution of the age at which women 
have their first child. 

(d) What are the locations of the median and quar- 
tiles in the ordered list of all maternal ages? In which 
age classes do the median and quartiles fall? 


More on Nintendo and laparoscopic surgery. In Ex- 
ercise 1.38 (page 43) you examined the improvement 
in times to complete a virtual gall bladder removal for 
those with and without four weeks of Nintendo Wii™ 
training. The most common methods for formal com- 
parison of two groups use x and s to summarize the 
data. £jjj)| NINTENDO 


(a) What kinds of distributions are best summarized 
by x and s? Do you think these summary measures are 
appropriate in this case? 

(b) In the control group, one subject improved their 
time by 229 seconds. How much does removing this 
observation change x and s for the control group? You 
will need to compute x and s for the control group, 
both with and without the high outlier. 


2.34 


2.35 


2.36 


(c) Compute the median for the control group with 
and without the high outlier. What does this show 
about the resistance of the median and x? 


Making resistance visible. In the Mean and Median 
applet, place three observations on the line by clicking 
below it: two close together near the center of the line, 
and one somewhat to the right of these two. 


(a) Pull the single rightmost observation out to the 
right. (Place the cursor on the point, hold down a 
mouse button, and drag the point.) How does the 
mean behave? How does the median behave? Explain 
briefly why each measure acts as it does. 

(b) Now drag the single rightmost point to the left 

as far as you can. What happens to the mean? What 
happens to the median as you drag this point past the 
other two (watch carefully)? 


Behavior of the median. Place five observations on 
the line in the Mean and Median applet by clicking 
below it. 


(a) Add one additional observation without changing 
the median. Where is your new point? 

(b) Use the applet to convince yourself that when you 
add yet another observation (there are now seven in 
all), the median does not change, no matter where you 
put the seventh point. Explain why this must be true. 


Never on Sunday: also in Canada? Exercise 1.5 (page 
20) gives the number of births in the United States 


2.37 


2.38 
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on each day of the week during an entire year. The 
boxplots in Figure 2.6 are based on more detailed data 
from Toronto, Canada: the number of births on each 
of the 365 days in a year, grouped by day of the week."” 
Based on these plots, compare the day-of-the-week 
distributions using shape, center, and variability. Sum- 
marize your findings. 


Thinking about means. Table 1.2 (page 24) gives the 
percent of foreign-born residents in each of the states. 
For the nation as a whole, 13.0% of residents are for- 
eign-born. Find the mean of the 51 entries in Table 
1.2. It is not 13.0%. Explain carefully why this hap- 
pens. (Hint: The states with the largest populations are 
California, Texas, New York, and Florida. Look at their 
entries in Table 1.2.) jj) FOREIGN 


Thinking about medians. A report says that “the me- 
dian credit card debt of American households is zero.” 
We know that many households have large amounts 
of credit card debt. In fact, the mean household credit 
card debt is close to $8000. Explain how the median 
debt can nonetheless be zero. 


Thinking about means and medians. In 2012, ap- 
proximately 5% of employed wage and salary workers 
were being paid hourly rates at the federal minimum 
wage level. Would federal legislation to increase the 
minimum wage have a greater effect on the mean or 
the median income of all workers. Explain your answer. 


Saturday Sunday 


Boxplots of the distributions of numbers of births in Toronto, Canada, on each day of the 


week during a year, for Exercise 2.36. 
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A standard deviation contest. This is a standard de- 
viation contest. You must choose four numbers from 
the whole numbers 0 to 10, with repeats allowed. 


(a) Choose four numbers that have the smallest 
possible standard deviation. 

(b) Choose four numbers that have the largest pos- 
sible standard deviation. 

(c) Is more than one choice possible in either (a) or 
(b)? Explain. 


You create the data. Create a set of seven numbers 
(repeats allowed) that have the five-number summary 


4Q,=7 M=12 Q; = 14 Maximum = 19 


There is more than one set of seven numbers 
with this five-number summary. What must be true 
about the seven numbers to have this five-number 
summary? 


You create the data. Give an example of a small set 
of data for which the mean is smaller than the first 
quartile. 


Adolescent obesity. Adolescent obesity is a seri- 
ous health risk affecting more than 5 million young 
people in the United States alone. Laparoscopic 
adjustable gastric banding has the potential to pro- 
vide a safe and effective treatment. Fifty adolescents 
between 14 and 18 years old with a body mass in- 
dex (BMI) higher than 35 were recruited from the 
Melbourne, Australia, community for the study.'® 
Twenty-five were randomly selected to undergo gas- 
tric banding, and the remaining twenty-five were as- 
signed to a supervised lifestyle intervention program 
involving diet, exercise, and behavior modification. 
All subjects were followed for two years. Here are 
the weight losses in kilograms for the subjects who 
completed the study: 


Eli GASTRIC 


=| 
ei 


GASTRIC BANDING 


35.6 814. 576 32.8 mn 
36.5 -54 279 49.0 64.8 39.0 
43.0 33.9 29.7 202 15.2 41.7 
53.4 134 248 19.4 ns 
LIFESTYLE INTERVENTION 
6.0 In 30 106 116 15.5 
-17.0 i. ne OE Tums 34.6 
6.0 an 43 io? =18 12.8 


(a) In the context of this study, what do the negative 
values in the data set mean? 

(b) Give a graphical comparison of the weight loss 
distribution for both groups using side-by-side box- 
plots. Provide appropriate numerical summaries for 
the two distributions and identify any high outliers in 
either group. What can you say about the effects of 
gastric banding versus lifestyle intervention on weight 
loss for the subjects in this study? 

(c) The measured variable was weight loss in kilograms. 
Would two subjects with the same weight loss always 
have similar benefits from a weight-reduction program? 
Does it depend on their initial weights? Other variables 
considered in this study were the percent of excess 
weight lost and the reduction in BMI. Do you see any 
advantages to either of these variables when comparing 
weight loss for two groups? 

(d) One subject from the gastric-banding group 
dropped out of the study, and seven subjects from the 
lifestyle group dropped out. Of the seven dropouts in 
the lifestyle group, six had gained weight at the time 
they dropped out. If all subjects had completed the 
study, how do you think it would have affected the 
comparison between the two groups? 


Exercises 2.44 to 2.49 ask you to analyze data without having the 
details outlined for you. The exercise statements give you the State 
step of the four-step process. In your work, follow the Plan, Solve, 
and Conclude steps as illustrated in Example 2.9 (page 64). 


2.44 Athletes’ salaries. The Montreal Canadiens were 

founded in 1909 and are the longest continuously 
4s operating professional ice hockey team. They have 
won 24 Stanley Cups, making them one of the most 
successful professional sports teams of the traditional 
four major sports of Canada and the United States. 
Table 2.1 (page 71) gives the salaries of the 2013-14 
roster. — Pro- 
vide the team 
owner with a 
full description 
of the distribu- 


3 

tion of salaries 5 
: < 

and a_ brief = 
summary of its 5 
_ n 

most —impor- 5 
& 

tant features. fe) 


HOCKEY 


Returns on stocks. How well have stocks done over 
the past generation? The Wilshire 5000 index describes 
the average performance of all U.S. stocks. The aver- 
age is weighted by the total market value of each com- 
pany’s stock, so think of the index as measuring the 


2.45 
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PLAYER SALARY PLAYER SALARY PLAYER SALARY 
Andrei Markov $5,750,000 | Brandon Prust $2,500,000 | George Parros $950,000 
Carey Price $5,750,000 | Alexei Emelin $2,000,000 | Alex Galchenyuk $925,000 
Brian Gionta $5,000,000 | Travis Moen $1,850,000 | Yannick Weber $900,000 
Tomas Plekanec $5,000,000 | Francis Bouillon $1,500,000 | Brendan Gallagher $715,000 
Rene Bourque $4,000,000 | Douglas Murray $1,500,000 | Ryan White $700,000 
Daniel Briere $4,000,000 | Lars Eller $1,500,000 | Michael Bournival $690,000 
Max Pacioretty $4,000,000 | Peter Budaj $1,400,000 | Michael Blunden $575,000 
P K. Suban $3,750,000 | Raphael Diaz $1,250,000 | Petteri Nokelainen $575,000 
David Desharnais $3,500,000 | Colby Armstrong $1,000,000 


performance ofthe average investor. Hereare thepercent 
returns on the Wilshire 5000 index for the years from 


1971 to 2013: Sy) witsHiRe 
YEAR RETURN YEAR RETURN YEAR RETURN 
1971 16.19 1986 15.61 2001 —10.97 
1972 17.34 1987 eS 2002 —20.86 
1973 —18.78 1988 17.59 2003 31.64 
1974 =Zilsevll 1989 28.53 2004 12.48 
1975 37.38 1990 =(6{0)5} 2005 6.38 
1976 26.77 1991 BBS 2006 Bit 
1977 = 1992 9.02 2007 5.62 
1978 8.54 1993 10.67 2008 = 57/3) 
1979 24.40 1994 0.06 2009 28.30 
1980 SB) It 1995 36.41 2010 17.16 
1981 —3.98 1996 21.56 2011 0.98 
1982 20.43 1997 31.48 2012 16.06 
1983 Oa 1998 24.31 2012 16.06 
1984 ST 1999 24.23 2013 33.07 
1985 31.46 2000 —10.89 

What can you say about the distribution of yearly re- 

turns on stocks? 
2.46 Do good smells bring good business? Businesses 


know that customers often respond to background 
music. Do they also respond to odors? Nicolas 


2.47 


Guéguen and his colleagues studied this question 
in a small pizza restaurant in France on Saturday 
evenings in May. On one evening, a relaxing lav- 
ender odor was spread through the restaurant; on 
another evening, a stimulating lemon odor; a third 
evening served as a control, with no odor. Table 2.2 
(page 72) shows the amounts (in euros) that cus- 
tomers spent on each of these evenings.”” Compare 
the three distributions. Were both odors associated 
with increased customer spending? & 


Good weather and tipping. Favorable weather has 
been shown to be associated with increased tipping. 
Will just the belief that future weather will be favor- 
able lead to higher tips? The researchers gave 60 
index cards to a waitress at an Italian restaurant in 
New Jersey. Before delivering the bill to each custom- 
er, the waitress randomly selected a card and wrote 
on the bill the same message that was printed on 
the index card. Twenty of the cards had the message 
“The weather is supposed to be really good tomor- 
row. I hope you enjoy the day!” Another 20 cards 
contained the message “The weather is supposed to 
be not so good tomorrow. I hope you enjoy the day 
anyway!” The remaining 20 cards were blank, indi- 
cating that the waitress was not supposed to write 
any message. Choosing a card at random ensured 
that there was a random assignment of the customers 
to the three experimental conditions. Here are the 
tip percents for the three messages: TIPPING 


Good weather report: Oss See) Ss IO) ES AS) HONS} 
24.9 jhe) 27.0 20.4 Dey 24.0 Daley) DA DA) ell 
Bad weather report: 18.0 19.0 19.2 18.8 18.4 19.0 18.5 161 168 14.0 
17.0 13.6 IED 19.9 20.2 18.8 18.0 Dey) 18.2 19.4 
No weather report: 19.9 16.0 15.0 20.1 19.3 19.2 18.0 19.2 Dalled; 18.8 
ices) 1U)3I)3} 194 108 19.1 19.7 19.8 213 20.6 


72 


2.48 


CHAPTER 2 &@ Describing Distributions with Numbers 


NO ODOR 
15.9 18.5 15.9 18.5 18.5 21.9 15.9 15.9 15.9 15.9 
15.9 18.5 18.5 18.5 20.5 18.5 18.5 15.9 15.9 15.9 
18.5 18.5 15.9 18.5 15.9 18.5 15.9 255) 12.9 15.9 
LEMON ODOR 
18.5 15.9 18.5 18.5 18.5 15.9 18.5 15.9 18.5 18.5 
15.9 18.5 25 15.9 FS) 15.9 18.5 183.5 18.5 18.5 
25.9 15.9 15.9 15.9 18.5 18.5 18.5 18.5 
LAVENDER ODOR 
21.9 18.5 Hike) 21.9 18.5 24.9 18.5 NS DAES 21.9 
alee 18.5 259) 18.5 18.5 ZS) 18.5 18.5 24.9 21.9 
25.9 21.9 18.5 18.5 22.8 18.5 21.9 20.7 21.9 PS, 
Compare the three distributions. How did the tip per- 2.49 Cholesterol levels and age. The National Health and 


cents vary with the weather report information? 


Does playing video games improve surgical skill? In 
laparoscopic surgery a video camera and several thin in- 
struments are inserted into the patient’s abdominal cav- 
ity. The surgeon uses the image from the video camera 
positioned inside the patient’s body to perform the pro- 
cedure by manipulating the instruments that have been 
inserted. The Top Gun Laparoscopic Skills and Sutur- 
ing Program was developed to help surgeons develop 
the skill set necessary for laparoscopic surgery. Because 
of the similarity in many of the skills involved in video 
games and laparoscopic surgery, it was hypothesized 
that surgeons with greater prior video game experience 
might acquire the skills required in laparoscopic surgery 
more easily. Thirty three surgeons participated in the 
study and were classified into the three categories—never 
used, under three hours, and more than three hours— 
depending on the number of hours 


Nutrition Examination Survey (NHANES) is a unique 
survey that combines interviews and physical exami- 
nations.” It includes basic demographic information, 
questions about topics such as diet, physical activity, 
and prescription medications, as well as the results of a 
physical examination measuring a variety of variables, 
including blood pressure and cholesterol levels. The 
program began in the early 1960s, and the survey cur- 
rently examines a nationally representative sample of 
about 5000 persons each year. You will work with the 
total cholesterol measurements (mg/dL) obtained from 
participants in the survey in 2009-10. CHOLEST 


To examine changes in cholesterol with age, we 
consider only the 3044 participants between 20 and 
50 years of age and have classified them into the three 
age categories 20s, 30s, and 40s. The full data set is too 
large to print here, but here are the first 10 individuals: 


they played video games at the height Agee cua: 


30s 


20s 20s 40s 30s 40s 20s 30s 30s 20s 


of their video game use. They also per- 


formed Top Gun drills, and received a Total cholesterol: 


135 


GO 99 NOT, Seo Ge 2 02 eee One S42 


score based on the time to complete 
the drill and the number of errors 
made, with lower scores indicating better performance. 


The first individual is in the 30s with a total 
cholesterol of 135, the second in the 20s with total 
cholesterol of 160, and so forth. 

(a) Use graphical and numeri- 


cal summaries to compare the 


Never played: 9379 8302 5489 5334 4605 4789 9185 7216 9930 dihvae éhawel vanities, tow alas 
4828 5655 4623 7778 8837 5947 cholesterol change with age? 

Under three hours: 5540 6259 5163 6149 4398 3968 7367 4217 5716 (b) The ideal range of total 

Three or more hours: 7288 4010 4859 4432 4845 5394 2703 5797 3758 cholesterol is below 200 mg/dL. 


Compare the distributions for the three groups. How is 
prior video game experience related to Top Gun scores? 


For individuals with elevated 
cholesterol levels, prescription drugs are often recom- 
mended to lower levels. Among the 3044 participants 


between 20 and 50 years of age, 4 individuals in 
their 20s, 24 individuals in their 30s, and 117 indi- 
viduals in their 40s were taking prescription medica- 
tions to reduce their cholesterol levels. How do you 
think your comparison of the distribution would be 
changed if none of the individuals was taking medi- 
cation? Explain. 


Exercises 2.50 to 2.53 make use of the optional material on the 
1.5 X IQR rule for suspected outliers. 


2.50 Graduation rates. In Exercise 1.10 (page 32) you 
were asked to use a stemplot to display the distribu- 
tion of the percents of on-time high school graduates 
in the states. Stemplots help you find the five-num- 
ber summary because they arrange the observations 
in increasing order. 


(a) Give the five-number summary of this distribution. 
(b) Use the five-number summary to draw a boxplot 
of all the data. What is the shape of the distribution? 
(c) Which observations does the 1.5 X IQR rule flag 
as suspected outliers? Is there a simple explanation 


for the outlier(s)? 2y))) GRADRATE 


Foreign born. Figure 1.10 (page 30) gives a stemplot 
of the percent of state residents who were born outside 
the United States for the 50 states and the District of 
Columbia. FOREIGN 


(a) Give the five-number summary of this distribution. 


2.51 
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(b) Is California an outlier or just the largest obser- 
vation in a strongly skewed distribution? What does 
the 1.5 X IQR rule say? 


The Fortune Global 500. The Fortune Global 500, 
also known as the Global 500, is an annual ranking 
of the top 500 corporations worldwide as measured by 
revenue. Table 2.3 provides a list of the 31 companies 
with the highest revenues (in billions of dollars) in 
2013.%* A stemplot or histogram shows that the distri- 


bution is strongly skewed to the right. £j),| GLoBesoo 


(a) Give the five-number summary. Explain why 
this summary suggests that the distribution is right- 
skewed. 

(b) Which companies are outliers according to the 
1.5 X IQR rule? Make a stemplot of the data. Do you 
agree with the rule’s suggestions about which compa- 
nies are and are not outliers? 

(c) If you consider all 500 companies, the 31 compa- 
nies in Table 2.3 represent the high outliers among 
all Global 500 companies. Is there a common feature 
shared by many of the 31 companies in the table? 


Cholesterol for people in their 20s. Exercise 2.49 
contains the cholesterol levels of individuals in their 
20s from the NHANES survey in 2009-10. The cho- 
lesterol levels are right-skewed, with a few large cho- 
lesterol levels. Which cholesterol levels are suspected 
outliers by thel.5 IQR rule? £jy) cHoLes2o 


COMPANY NAME REVENUES ($B) COMPANY NAME REVENUES ($B) 
Royal Dutch Shell 481.7 ENI 167.9 
Wal-Mart Stores 469.2 Berkshire Hathaway 162.5 
Exxon Mobil 449.9 Apple 156.5 
Sinopec Group 428.2 AXA 154.6 
China National Petroleum 408.6 Gazprom 158°5 
BP 388.3 General Motors 152.3) 
State Grid 298.4 Daimler 146.9 
Toyota Motor 265.7 General Electric 146.9 
Volkswagen 247.6 Petrobras 144.1 
Total 234.3 Exor Group 142.2 
Chevron 233.9 Valero Energy 138.3 
Glencore Xstrata 214.4 Ford Motor 134.3 
Japan Post Holdings 190.9 Industrial G Commercial Bank of China 133.6 
Samsung Electronics 178.6 Hon Hai Precision Industry IBPol 
E.ON 169.8 Allianz 130.8 
Phillips 66 169.6 
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Exploring the Web 


2.54 Home run leaders. The three top players on the career home run list are Barry 


2.55 


Bonds, Hank Aaron, and Babe Ruth. You can find their home run statistics by go- 
ing to the website www. baseball-reference. comand then clicking on the 
Players tab at the top of the page. Construct three side-by-side boxplots compar- 
ing the yearly home run production of Barry Bonds, Hank Aaron, and Babe Ruth. 
Describe any differences that you observe. It is worth noting that in his first four 
seasons, Babe Ruth was primarily a pitcher. If these four seasons are ignored, how 
does Babe Ruth compare with Barry Bonds and Hank Aaron? 


Crime rates and outliers. The Statistical Abstract of the United States is a compre- 
hensive summary of statistics on the social, political, and economic organization of the 
United States. It can be found at the website www. census .gov/compendia/ 
statab/.Go to the section Law Enforcement, Courts and Prisons, and then to the 
subsection Crimes and Crime Rates. Several tables of data will be available. 

(a) Open the Table on Crime Rates by State and Type for the latest year given. 
Why do you think they use rates per 100,000 population rather than the number 
of crimes committed? The District of Columbia is a high outlier in several crime 
categories. 

(b) Open the Table on Crime Rates by Type for Selected Large Cities. This table 
includes the District of Columbia, which is listed as Washington, DC. Without do- 
ing any formal calculations, does the District of Columbia look like a high outlier 
in the table for large cities?’ Whether or not the District of Columbia is an outlier 
depends on more than its crime rate. It also depends on the other observations 
included in the data set. Which data set do you feel is more appropriate for the 
District of Columbia? 

(c) Using the Table on Crime Rates by State and Type for the latest year given, 
choose a crime category, and give a full description of its distribution over the 
50 states, omitting the District of Columbia. Your description of the distribution 
should include appropriate graphical and numerical summaries and a brief report 
describing the main features of the distribution. You can open the data as an Excel 
file and import it into your statistical software. 
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The Normal Distributions 


distributions. What is more, we have a clear strategy for exploring data on 
a single quantitative variable. 


WAVE now have a tool box of graphical and numerical methods for describing 


Exploring a Distribution 


1. Always plot your data: make a graph, usually a histogram or a stemplot. 


2. Look for the overall pattern (shape, center, variability) and for striking devia- 
tions such as outliers. 


3. Calculate a numerical summary to briefly describe center and variability. 


In this chapter, we add one more step to this strategy: 


4. Sometimes the overall pattern of a large number of observations is so regular 
that we can describe it by a smooth curve. 


Figure 3.1 is a histogram of the scores of all 947 seventh-grade students in Gary, 
Indiana, on the vocabulary part of the Iowa Tests of Basic Skills.! Scores of many 
students on this national test have a quite regular distribution. The histogram is 


CRATER 


In this chapter 
we cover... 


3.1 
3.2 
3.3 
3.4 
3.5 


3.6 
3.7 
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Density curves 
Describing density curves 
Normal distributions 

The 68-95-99.7 rule 


The standard Normal 
distribution 


Finding Normal proportions 


Using the standard Normal 
table 


Finding a value given a 
proportion 
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2 4 6 8 10 iD) 
lowa Tests vocabulary score 


FIGURE 3.1 
Histogram of the /owa Tests vocabulary scores of all seventh-grade students in Gary, Indiana. The smooth 
curve shows the overall shape of the distribution. 


symmetric, and both tails fall off smoothly from a single center peak. There are 
no large gaps or obvious outliers. The smooth curve drawn through the tops of 
the histogram bars in Figure 3.1 is a good description of the overall pattern of 
the data. 


=a Our eyes respond to the areas of the bars in a histogram. The bar areas represent 
proportions of the observations. Figure 3.2(a) is a copy of Figure 3.1 with the left- 
most bars shaded. The area of the shaded bars in Figure 3.2(a) represents the stu- 
dents with vocabulary scores of 6.0 or lower. There are 287 such students, who 
make up the proportion 287/947 = 0.303 of all Gary seventh-graders. 

Now look at the curve drawn through the bars. In Figure 3.2(b), the area under 
the curve to the left of 6.0 is shaded. We can draw histogram bars taller or shorter 
by adjusting the vertical scale. In moving from histogram bars to a smooth curve, we 
make a specific choice: adjust the scale of the graph so that the total area under the 
curve is exactly 1. The total area represents the proportion 1, that is, all the observa- 
tions. We can then interpret areas under the curve as proportions of the observa- 
tions. The curve is now a density curve. The shaded area under the density curve in 
Figure 3.2(b) represents the proportion of students with score 6.0 or lower. This 
area is 0.293, only 0.010 away from the actual proportion 0.303. The method for 
finding this area will be presented shortly. For now, note that the areas under the 
density curve give quite good approximations to the actual distribution of the 947 
test scores. m 
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lowa Tests vocabulary score lowa Tests vocabulary score 
(a) (b) 
FIGURE 3.2(a) FIGURE 3.2(b) 
The proportion of scores less than or equal to 6.0 in the actual data The proportion of scores less than or equal to 6.0 from the density 
is 0.303. curve is 0.293. The density curve is a good approximation to the 


distribution of the data. 


Density Curve 


A density curve is a curve that 
© Is always on or above the horizontal axis. 
«= Has area exactly 1 underneath it. 


A density curve describes the overall pattern of a distribution. The area under the curve 
and above any range of values is the proportion of all observations that fall in that range. 


Density curves, like distributions, come in many shapes. Figure 3.3 shows a strongly 
skewed distribution, the survival times of guinea pigs from Exercise 2.31 (page 68). 


FIGURE 3.3 
A right-skewed distribution pictured by 
both a histogram and a density curve. 


0 100 200 300 400 500 600 
Survival time (days) 
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The histogram and density curve were both created from the data by software. 
Both show the overall shape and the “bumps” in the long right tail. The density 
curve shows a single high peak as a main feature of the distribution. The histogram 
divides the observations near the peak between two bars, thus reducing the height 
of the peak. A density curve is often a good description of the overall pattern of 
a distribution. Outliers, which are deviations from the overall pattern, are not 
described by the curve. Of course, no set of real data is exactly described by 
A a density curve. The curve is an idealized description that is easy to use and 
accurate enough for practical use. 


Apply Your Knowledge 


FIGURE 3.4 

The density curve for the location of 
accidents along a 5-mile bike path, 
for Exercise 3.2. 


3.1. Sketch Density Curves. Sketch density curves that describe distributions 
with the following shapes: 


(a) Symmetric, but with two peaks (that is, two strong clusters of observations) 
(b) Single peak and skewed to the left 


3.2 Accidents on a Bike Path. Examining the location of accidents on a level, 
5-mile bike path shows that they occur uniformly along the length of the path. 
Figure 3.4 displays the density curve that describes the distribution of accidents. 


Height - 1/5 -- 


0) 1 2 3 4 5 
Distance along bike path (miles) 


(a) Explain why this curve satisfies the two requirements for a density curve. 

(b) The proportion of accidents that occur in the first 2 miles of the path is 
the area under the density curve between 0 miles and 2 miles. What is 
this area? 

(c) There is a stream alongside the bike path between the 0.7 mile mark 
and the 1.5 mile mark. What proportion of accidents happen on the 
bike path alongside the stream? 

(d) There are roads at the two ends of the bike path, but the remainder 
of the paved bike path goes through the woods. What proportion 
of accidents happen more than | mile from either road? (Hint: First 
determine where on the bike path the accident needs to occur to be 
more than 1 mile from either road, and then find the area.) 


Our measures of center and variability apply to density curves as well as to actual 
sets of observations. The median and quartiles are easy. Areas under a density curve 
represent proportions of the total number of observations. The median is the point 


3.2 Describing Density Curves 


with half the observations on either side. So the median of a density curve is the equal- 
areas point, the point with half the area under the curve to its left and the remain- 
ing half of the area to its right. The quartiles divide the area under the curve into 
quarters. One-fourth of the area under the curve is to the left of the first quartile, 
and three-fourths of the area is to the left of the third quartile. You can roughly 
locate the median and quartiles of any density curve by eye by dividing the area 
under the curve into four equal parts. 

Because density curves are idealized patterns, a symmetric density curve is exactly 
symmetric. The median of a symmetric density curve is therefore at its center. 
Figure 3.5(a) shows a symmetric density curve with the median marked. It isn’t so 
easy to spot the equal-areas point on a skewed curve. There are mathematical ways 
of finding the median for any density curve. That’s how we marked the median on 
the skewed curve in Figure 3.5(b). 


; t | Mean 
Median and mean Median 
(a) (b) 
FIGURE 3.5(a) FIGURE 3.5(b) 
The median and mean of a symmetric density The median and mean of a right-skewed density curve. The mean 
curve both lie at the center of symmetry. is pulled away from the median toward the long tail. 


What about the mean? The mean of a set of observations is their arithmetic 
average. If we think of the observations as weights strung out along a thin rod, the 
mean is the point at which the rod would balance. This fact is also true of density 
curves. The mean is the point at which the curve would balance if made of solid material. 
Figure 3.6 illustrates this fact about the mean. A symmetric curve balances at its 
center because the two sides are identical. The mean and median of a symmetric 
density curve are equal, as in Figure 3.5(a). We know that the mean of a skewed 
distribution is pulled toward the long tail. Figure 3.5(b) shows how the mean of a 
skewed density curve is pulled toward the long tail more than is the median. It’s 
hard to locate the balance point by eye on a skewed curve. There are mathematical 
ways of calculating the mean for any density curve, so we are able to mark the mean 
as well as the median in Figure 3.5(b). 


FIGURE 3.6 
The mean is the balance point of a density curve. 


The long right tail pulls 
the mean to the right. 
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Median and Mean of a Density Curve 


The median of a density curve is the equal-areas point, the point that divides the 
area under the curve in half. 

The mean of a density curve is the balance point at which the curve would balance 
if made of solid material. 

The median and mean are the same for a symmetric density curve. They both lie at 
the center of the curve. The mean of a skewed curve is pulled away from the median 
in the direction of the long tail. 


Because a density curve is an idealized description of a distribution of data, we 
need to distinguish between the mean and standard deviation of the density curve 
and the mean x and standard deviation s computed from the actual observations. The 

mean of a density curve 4 —_ usual notation for the mean of a density curve is pz (the Greek letter mu). We write 
standard deviation of a density _ the standard deviation of a density curve as o (the Greek letter sigma). We can 
curve 7 roughly locate the mean yp of any density curve by eye, as the balance point. There 

is no easy way to locate the standard deviation o by eye for density curves in general. 


(9) LaunchPad Online Resources 


* The StatBoards video, Density Curves, provides additional examples of density 
curves. 


Apply Your Knowledge 


3.3. Mean and Median. What is the mean p of the density curve pictured in 
Figure 3.4 on page 78? (That is, where would the curve balance? ) What is 
the median? (That is, where is the point with area 0.5 on either side?) 

3.4 Mean and Median. Figure 3.7 displays three density curves, each with 
three points marked on it. At which of these points on each curve do the 
mean and the median fall? 


FIGURE 3.7 
Three density curves, for Exercise 3.4. 


One particularly important class of density curves has already appeared in 
Normal curves _ Figures 3.1 and 3.2. They are called Normal curves. The distributions they describe 
Normal distributions _ are called Normal distributions. Normal distributions play a large role in statistics, 
but they are rather special and not at all “normal” in the sense of being usual or 
average. We capitalize Normal to remind you that these curves are special. Look at 

the two Normal curves in Figure 3.8. They illustrate several important facts: 


» All Normal curves have the same overall shape: symmetric, single-peaked, 
bell-shaped. 

e Any specific Normal curve is completely described by giving its mean ps and 
its standard deviation o. 


3.3 Normal Distributions 6B 


FIGURE 3.8 
Two Normal curves, showing the mean yw and standard deviation o. 


= The mean is located at the center of the symmetric curve and is the same as 
the median. Changing yz without changing a moves the Normal curve along 
the horizontal axis without changing its variability. 


« The standard deviation o controls the variability of a Normal curve. When 
the standard deviation is larger, the area under the normal curve is less 
concentrated about the mean. 


The standard deviation o is the natural measure of variability for Normal distri- 
butions. Not only do and o@ completely determine the shape of a Normal curve, 
but we can also locate ao by eye on a Normal curve. Here’s how. Imagine that you 
are skiing down a mountain that has the shape of a Normal curve. At first, you 
descend at an ever-steeper angle as you go out from the peak: 


/ 


Fortunately, before you find yourself going straight down, the slope begins to grow 
flatter rather than steeper as you go out and down: 


a ‘. 


The points at which this change of curvature takes place are located at distance 
on either side of the mean mw. You can feel the change as you run a pencil along a 
Normal curve, and so find the standard deviation. Remember that yz and o alone 

do not specify the shape of most distributions, and that the shape of density 
A curves in general does not reveal a. These are special properties of Normal 
distributions. 


Normal Distributions 


A Normal distribution is described by a Normal density curve. Any particular Nor- 
mal distribution is completely specified by two numbers: its mean and standard 
deviation o. 

The mean of a Normal distribution is at the center of the symmetric Normal curve. 
The standard deviation is the distance from the center to the change-of-curvature 
points on either side. 
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FIGURE 3.9 
The 68—95-99,.7 rule for Normal 
distributions. 


Why are the Normal distributions important in statistics? Here are three reasons. 
First, Normal distributions are good descriptions for some distributions of real data. 
Distributions that are often close to Normal include scores on tests taken by many 
people (such as Iowa Tests and SAT exams), repeated careful measurements of the 
same quantity, and characteristics of biological populations (such as lengths of 
crickets and yields of corn). Second, Normal distributions are good approximations 
to the results of many kinds of chance outcomes, such as the proportion of heads in 
many tosses of a coin. Third, we will see that many statistical inference procedures 
based on Normal distributions work well for other roughly symmetric distributions. 
However, many sets of data do not follow a Normal distribution. Most income dis- 
tributions, for example, are skewed to the right and so are not Normal. Non-Normal 
data, like non-normal people, not only are common but also are sometimes more 
interesting than their Normal counterparts. 


Although there are many Normal curves, they all have common properties. In 
particular, all Normal distributions obey the following rule. 


The 68-95-99.7 Rule 


In the Normal distribution with mean pw and standard deviation o: 

» Approximately 68% of the observations fall within o of the mean wp. 
= Approximately 95% of the observations fall within 20 of pw. 

= Approximately 99.7% of the observations fall within 30 of pw. 


Figure 3.9 illustrates the 68—95—99.7 rule. By remembering these three numbers, 
you can think about Normal distributions without constantly making detailed 
calculations. 


i<— 68% of data —> 
95% of data 


99.7% of data 


T T 
U-30 b-20 p-o u U+o +20 w+30 
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Figures 3.1 and 3.2 (see pages 76 and 77) show that the distribution of lowa Tests 
vocabulary scores for seventh-grade students in Gary, Indiana, is close to Normal. 
Suppose that the distribution is exactly Normal with mean jw = 6.84 and standard 
deviation o = 1.55. (These are the mean and standard deviation of the 947 actual 
scores.) 

Figure 3.10 applies the 68—95-99.7 rule to the lowa Tests scores. The 95 part 
of the rule says that approximately 95% of all scores are between 


be — 20 = 6.84 — (2)(1.55) = 6.84 — 3.10 = 3.74 
and 
b+ 20 = 6.84 + (2)(1.55) = 6.84 + 3.10 = 9.94 


The other 5% of scores are outside this range. Because Normal distributions are sym- 
metric, half of these scores are lower than 3.74 and half are higher than 9.94. That 
is, about 2.5% of the scores are below 3.74 and 2.5% are above 9.94. = 


One standard deviation 
is 1.55. 


- 68% of data 


95% of data 
2.5% of scores 
are below 3.74. 


- 99.7% of data 


aA) 3.74 pg) 6.84 Gc) 9.94 11.49 
lowa Tests score 


such as the actual Gary scores are never exactly Normal. For one thing, Iowa 

Tests scores are reported only to the nearest tenth. A score can be 9.9 or 
10.0, but not 9.94. We use a Normal distribution because it’s a good approximation 
and because we think the knowledge that the test measures is continuous rather 
than stopping at tenths. 

How well does our work in Example 3.2 describe the actual Iowa Tests scores? 
Well, 900 of the 947 scores are between 3.74 and 9.94. That’s 95.04%, very accurate 
indeed. Of the remaining 47 scores, 20 are below 3.74 and 27 are above 9.94. The 
tails of the actual data are not quite equal, as they would be in an exactly Normal 
distribution. Normal distributions often describe real data better in the center of 
the distribution than in the extreme high and low tails. 


A The 68—95—99.7 rule describes distributions that are exactly Normal. Real data 
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FIGURE 3.10 

The 68-95-99,.7 rule applied to the 
distribution of lowa Tests scores for 
seventh-grade students in Gary, Indiana, 
for Example 3.2. The mean and standard 
deviation are w = 6.84 and a = 1.55. 


cde.gov 
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Look again at Figure 3.10. A score of 5.29 is one standard deviation below the mean. 
What percent of scores are higher than 5.29? Find the answer by adding areas in the 
figure. Here is the calculation in pictures: 


+ — 
68% \ 84% 
529 8.39 8.39 5.29 


percent between 5.29 and 8.39 + percent above 8.39 = percent above 5.29 
68% + 16% = 84% 


So approximately 84% of scores are higher than 5.29. Be sure you understand 
where the 16% came from. We know that 68% of scores are between 5.29 and 
8.39, so 32% of scores are outside that range. These are equally split between the 
two tails, with 16% below 5.29 and 16% above 8.39. = 


Because we will mention Normal distributions often, a short notation is helpful. 
We abbreviate the Normal distribution with mean p and standard deviation o as 
N(p, o). For example, the distribution of Gary Iowa Tests scores is approximately 


N(6.84, 1.55). 


() LaunchPad Online Resources 


* The StatClips Examples video, The Normal Distribution Example B, works 
through an example of using the 68—95-—99.7 rule to compute areas under 
the Normal curve. 


Apply Your Knowledge 
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3.5 Upper Arm Lengths. Anthropomorphic data are measurements on the 
human body that can track growth and weight of infants and children and 
evaluate changes in the body that occur over the adult life span. The result- 
ing data can be used in areas as diverse as ergonomics and clothing design. 
The upper arm length of females over 20 years old in the United States is ap- 
proximately Normal with mean 35.8 centimeters (cm) and standard deviation 
2.1. cm. Draw a Normal curve on which this mean and standard deviation are 
correctly located. (Hint: Draw an unlabeled Normal curve, locate the points 
where the curvature changes, then add number labels on the horizontal axis.) 

As seen in Figure 3.11, the upper arm length is measured from the acro- 
mion process, the highest point of the shoulder, down the posterior surface of 
the arm to the tip of the olecranon process, the bony part of the mid-elbow.’ 


3.6 Upper Arm Lengths. The upper arm length of females over 20 years old 
in the United States is approximately Normal with mean 35.8 centimeters 
(cm) and standard deviation 2.1 cm. Use the 68—95—99.7 rule to answer the 
following questions. (Start by making a sketch like Figure 3.10.) 


FIGURE 3.11 


(a) What range of lengths covers almost all (99.7%) of this distribution? 
Correct tape placement when measuring 


upper arm length, for Exercise 3.5. (b) What percent of women over 20 have upper arm lengths less than 33.7 cm? 
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3.7. Monsoon Rains. The summer monsoon rains bring 80% of India’s rainfall 
and are essential for the country’s agriculture. Records going back more than 
a century show that the amount of monsoon rainfall varies from year to year 
according to a distribution that is approximately Normal with mean 852 mil- 
limeters (mm) and standard deviation 82 mm.’ Use the 68—95-99.7 rule to 
answer the following questions. 


(a) Between what values do the monsoon rains fall in 95% of all years? 


(b) How small are the monsoon rains in the driest 2.5% of all years? 


As the 68-95-99,7 rule suggests, all Normal distributions share many properties. 
In fact, all Normal distributions are the same if we measure in units of size o about 
the mean yp as center. Changing to these units is called standardizing. To standardize a 
value, subtract the mean of the distribution and then divide by the standard deviation. 


Standardizing and z-Scores 


If x is an observation from a distribution that has mean pw and standard deviation o, 
the standardized value of x is 


A standardized value is often called a z-score. 


A xscore tells us how many standard deviations the original observation falls 
away from the mean, and in which direction. Observations larger than the mean are 
positive when standardized, and observations smaller than the mean are negative. 


The heights of women aged 20 to 29 in the United States are approximately Normal 
with w = 64.2 inches and o = 2.8 inches.* The standardized height is 

_ height — 64.2 

7 2.8 
A woman's standardized height is the number of standard deviations by which her 
height differs from the mean height of all women aged 20 to 29. A woman 70 inches 
tall, for example, has standardized height 

_ 70 — 64.2 


= 2.07 
2.8 


or 2.07 standard deviations above the mean. Similarly, a woman 5 feet (60 inches) 
tall has standardized height 


_ 60 — 64.2 
2.8 


or 1.50 standard deviations less than the mean height. m= 


= —1.50 


We often standardize observations from symmetric distributions to express them 
in a common scale. We might, for example, compare the heights of two children of 


ec 

FAM HE SAID, 

WAM SHE SAID. 
Height, weight, and body mass 
distributions in this book come from 
actual measurements by a government 
survey. That is a good thing. When 
asked their weight, almost all women 
say they weigh less than they really do. 
Heavier men also underreport their 
weight—but lighter men claim to 
weigh more than the scale shows. We 
leave you to ponder the psychology 
of the two sexes. Just remember that 
“say-so” is no substitute for measuring. 
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different ages by calculating their z-scores. The standardized heights tell us where 
each child stands in the distribution for his or her age group. 

If the variable we standardize has a Normal distribution, standardizing does 
more than give a common scale. It makes all Normal distributions into a single 
distribution, and this distribution is still Normal. Standardizing a variable that has 
any Normal distribution produces a new variable that has the standard Normal 
distribution. 


Standard Normal Distribution 


The standard Normal distribution is the Normal distribution N(0, 1) with mean 
0 and standard deviation 1. 

If a variable x has any Normal distribution N(, 7) with mean pw and standard 
deviation o, then the standardized variable 


has the standard Normal distribution. 


() LaunchPad Online Resources 


* The Snapshots video, Normal Distributions, gives a nice example of comparing 
two Normal curves using both the 68-95-99.7 rule and z-scores. 


Apply Your Knowledge 
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3.8 SAT versus ACT. In 2013, when she was a high school senior, Idonna 
scored 670 on the mathematics part of the SAT The distribution of SAT 
math scores in 2013 was Normal with mean 514 and standard deviation 
118. Jonathan took the ACT and scored 26 on the mathematics portion. 
ACT math scores for 2013 were Normally distributed with mean 20.9 and 
standard deviation 5.3. Find the standardized scores for both students. 
Assuming that both tests measure the same kind of ability, who had the 
higher score? 

3.9 Men’s and Women’s Heights. The heights of women aged 20 to 29 in the 
United States are approximately Normal with mean 64.2 inches and stan- 
dard deviation 2.8 inches. Men the same age have mean height 69.4 inches 
with standard deviation 3.0 inches.° What are the z-scores for a woman 6 feet 
tall and a man 6 feet tall? Say in simple language what information the 
z-scores give that the original nonstandardized heights do not. 


Areas under a Normal curve represent proportions of observations from that 
Normal distribution. There is no formula for areas under a Normal curve. Calculations 
use either software that calculates areas or a table of areas. Most tables and software 
calculate one kind of area, cumulative proportions. The idea of “cumulative” is “every- 
thing that came before.” Here is the exact statement. 


3.6 Finding Normal Proportions 


Cumulative Proportions 


The cumulative proportion for a value x in a distribution is the proportion of ob- 
servations in the distribution that are less than or equal to x. 


Cumulative 
proportion 


The key to calculating Normal proportions is to match the area you want with 
areas that represent cumulative proportions. If you make a sketch of the area you 
want, you will almost never go wrong. Find areas for cumulative proportions either 
from software or (with an extra step) from a table. The following example shows 
the method in a picture. 


The National Collegiate Athletic Association (NCAA) uses a sliding scale for eligibility 
for Division | athletes.” Those students with a 2.5 high school GPA must score at least 
820 on the combined mathematics and critical reading parts of the SAT to compete 
in their first college year. The combined scores of the almost 1.7 million high school 
seniors taking the SAT in 2013 were approximately Normal with mean 1011 and 
standard deviation 216. What percent of high school seniors meet this SAT require- 
ment of a combined score of 820 or better? 

Here is the calculation in a picture: the proportion of scores above 820 is the area 
under the curve to the right of 820. That’s the total area under the curve (which is 
always 1) minus the cumulative proportion up to 820. 


820 820 
area right of 820 = total area — area left of 820 
= 1 = 0.1883 =08117 


About 81% of all high school seniors meet this SAT requirement of a combined math 
and reading score of 820 or higher. m 


There is no area under a smooth curve and exactly over the point 820. 
Consequently, the area to the right of 820 (the proportion of scores >820) is the 
same as the area at or to the right of this point (the proportion of scores =820). 
The actual data may contain a student who scored exactly 820 on the SAT. That 
the proportion of scores exactly equal to 820 is 0 for a Normal distribution is a con- 
sequence of the idealized smoothing of Normal distributions for data. 

To find the numerical value 0.1883 of the cumulative proportion in Example 3.5 
using software, enter the mean 1011 and standard deviation 216 and ask for the 
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cumulative proportion for 820. Software often uses terms such as “cumulative distri- 
bution” or “cumulative probability.” We will learn in Chapter 12 why the language 
of probability fits. Here, for example, is Minitab’s output: 


Cumulative Distribution Function 


Normal with mean = 1011 and standard deviation = 216 


x P( Xs x) 
820 0.188278 


observations.” The Normal Density Curve applet is even handier because it draws 
pictures as well as finding areas. If you are not using software, you can find cumula- 
tive proportions for Normal curves from a table. This requires an extra step. 


The extra step in finding cumulative proportions from a table is that we must 
first standardize to express the problem in the standard scale of z-scores. This allows 
us to get by with just one table, a table of standard Normal cumulative proportions. 
Table A in the back of the book gives cumulative proportions for the standard 
Normal distribution. The pictures at the top of the table remind us that the entries 
are cumulative proportions, areas under the curve to the left of a value z. 


@ The P in the output stands for “probability,” but we can read it as “proportion of the 


What proportion of observations on a standard Normal variable z take values less 
than 1.47? 


Solution: To find the area to the left of 1.47, locate 1.4 in the left-hand column of 
Table A, then locate the remaining digit 7 as 0.07 in the top row. The entry opposite 
1.4 and under 0.07 is 0.9292. This is the cumulative proportion we seek. Figure 
3.12 illustrates this area. m 


Table entry: 
area = .9292 


2=1.47 


FIGURE 3.12 
The area under a standard Normal curve to the left of the point z = 1.47 is 0.9292. Table A gives areas 
under the standard Normal curve. 


Now that you see how Table A works, let’s redo Example 3.5 using the table. We 
can break Normal calculations using the table into three steps. 


3.7 Using the Standard Normal Table 


Scores of high school seniors on the SAT follow the Normal distribution with mean 
pm = 1011 and standard deviation @ = 216. What percent of seniors score at least 820? 


Step 1. Draw a picture. The picture is exactly as in Example 3.5. It shows that 
area to the right of 820 = 1 — area to the left of 820 


Step 2. Standardize. Call the SAT score x. Subtract the mean, then divide by the stan- 
dard deviation, to transform the problem about x into a problem about a standard Normal z: 
x = 820 
x — 1011 7 820 — 1011 
216 216 

zZ = —0.88 


Step 3. Use the table. The picture shows that we need the cumulative propor- 
tion for x = 820. Step 2 says this is the same as the cumulative proportion for 
zZ = —0.88. The Table A entry for z = —0.88 says that this cumulative proportion 
is 0.1894. The area to the right of —0.88 is therefore 1 — 0.1894 = 0.8106. m 


The area from the table in Example 3.7 (0.8106) is slightly less accurate than the 
area from software in Example 3.5 (0.8117) because we must round z to two decimal 
places when we use Table A. The difference is rarely important in practice. Here’s 
the method in outline form. 


Using Table a to Find Normal Proportions 


Step 1. State the problem in terms of the observed variable x. Draw a picture that 
shows the proportion you want in terms of cumulative proportions. 


Step 2. Standardize x to restate the problem in terms of a standard Normal variable z. 


Step 3. Use Table A and the fact that the total area under the curve is 1 to find the 
required area under the standard Normal curve. 
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Recall that the NCAA uses a sliding scale for eligibility for Division | athletics. Students 
with a 2.5 GPA must have a combined SAT score of 820 or higher to be eligible. 
Students with lower GPAs will require higher SAT scores for eligibility, whereas stu- 
dents with higher GPAs can have a lower SAT score and still be eligible. For example, 
students with a 2.75 GPA are only required to have a combined SAT score that is 
at least 720. What proportion of all students who take the SAT would meet an SAT 
requirement of at least 720, but not 820? 


Step 1. State the problem and draw a picture. Call the SAT score x. The 
variable x has the N(1011, 216) distribution. What proportion of SAT scores fall 
between 720 and 820? Here is the picture: 


720 820 820 720 
z =—0.88 2=-1.35 


20° 
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Step 2. Standardize. Subtract the mean, then divide by the standard deviation, 
to turn x into a standard Normal z: 
720s x < 820 
720 — 1011 x— 1011 820 — 1011 
< < 
216 216 216 
=1.35 = Z < —0.88 


Step 3. Use the table. Follow the picture (we added the z-scores to the picture 
label to help you): 
area between — 1.35 and —0.88 = (area left of —0.88) — (area left of — 1.35) 
= 0.1894 — 0.0885 = 0.1009 


About 10% of high school seniors have SAT scores between 720 and 820. m 


Sometimes we encounter a value of z more extreme than those appearing in Table 
A. For example, the area to the left of y = —4 is not given directly in the table. The 
z-values in Table A leave only area 0.0002 in each tail unaccounted for. For practical 
purposes, we can act as if there is approximately zero area outside the range of Table 
A. Specifically, we act as if there is approximately zero area below xy = —3.5 and 
approximately zero area above zy = 3.5. While saying the area above x = 3.5 is 0.0002 
versus saying the area is approximately 0 makes little difference in statistical practice, 
conceptually it is important to remember that these areas are not actually zero. 


(9) LaunchPad Online Resources 


* The StatBoards video, Finding a Proportion Given a Value, provides the 
details of finding a proportion for a Normal distribution using the standard 
Normal curve. 
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3.10 Use the Normal Table. Use Table A to find the proportion of observa- 
tions from a standard Normal distribution that satisfies each of the following 
statements. In each case, sketch a standard Normal curve and shade the area 
under the curve that is the answer to the question. 


(€) ar a led OL) (3) & = =i 
(c) x < 1.45 (Gl) =O — 2 < 45 


3.11 Monsoon Rains. The summer monsoon rains in India 
follow approximately a Normal distribution with mean 852 
millimeters (mm) of rainfall and standard deviation 82 mm. 


(a) In the drought year 1987, 697 mm of rain fell. In what 
percent of all years will India have 697 mm or less of 
monsoon rain? 


(b) “Normal rainfall” means within 20% of the long-term 
average, or between 682 mm and 1022 mm. In what 
percent of all years is the rainfall normal? 


3.12 The Medical College Admissions Test. Almost all 
medical schools in the United States require students to take 
the Medical College Admission Test (MCAT).° The exam is 
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composed of three multiple-choice sections (Physical Sciences, Verbal Reasoning, 
and Biological Sciences). The score on each section is converted to a 15-point 
scale so that your total score has a maximum value of 45. The total scores follow 
a Normal distribution, and in 2013 the mean was 25.3 with a standard deviation 
of 6.5. There is little change in the distribution of scores from year to year. 


(a) What proportion of students taking the MCAT had a score over 30? 
(b) What proportion had scores between 20 and 25? 


Examples 3.5 to 3.8 illustrated the use of software or Table A to find what pro- 
portion of the observations satisfies some condition, such as “SAT score above 820.” 
We may instead want to find the observed value with a given proportion of the 
observations above or below it. Statistical software will do this directly. 


Scores on the SAT critical reading test in 2013 follow approximately the N(496, 115) 
distribution. How high must a student score to place in the top 10% of all students 
taking the SAT? 

We want to find the SAT score x with area 0.1 to its right under the Normal curve 
with mean w = 496 and standard deviation @ = 115. That’s the same as finding the 
SAT score x with area 0.9 to its /eft. Figure 3.13 poses the question in graphical form. 
Most software will tell you x when you enter mean 496, standard deviation 115, and 
cumulative proportion 0.9. Here is Minitab’s output: 


Inverse Cumulative Distribution Function 


Normal with mean = 496 and standard deviation 
x 


0.9 643.378 


Minitab gives x = 643.378. So scores above 644 are in the top 10%. (Round up 
because SAT scores can only be whole numbers.) = 


Area=0.10 


x=496 x=? 
2=0 2=128 


Without software, use Table A backward. Find the given proportion in the body 
of the table and then read the corresponding z from the left column and top row. 
There are again three steps. 


FIGURE 3.13 

Locating the point on a Normal curve 
with area 0.10 to its right, for 
Examples 3.9 and 3.10. 
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FIGURE 3.14 
Locating the first quartile of a Normal 
curve, for Example 3.11. 


Scores on the SAT critical reading test in 2013 follow approximately the N(496, 115) 
distribution. How high must a student score in order to place in the top 10% of all 
students taking the SAT? 


Step 1. State the problem and draw a picture. This step is exactly as in 
Example 3.9. The picture is Figure 3.13. The x-value that puts a student in the top 
10% is the same as the x-value for which 90% of the area is to the left of x. 


Step 2. Use the table. Look in the body of Table A for the entry closest to 0.9. 
It is 0.8997. This is the entry corresponding to z = 1.28. So z = 1.28 is the stan- 
dardized value with area 0.9 to its left. 


Step 3. Unstandardize to transform z back to the original x scale. We know that 
the standardized value of the unknown x is z = 1.28. This means that x itself lies 
1.28 standard deviations above the mean on this particular Normal curve. That is, 


xX = mean + (1.28)(standard deviation) 
= 496 + (1.28)(115) = 643.2 


A student must score at least 644 to place in the highest 10%. m 


High levels of cholesterol in the blood increase the risk of heart disease. For 14-year- 
old boys, the distribution of blood cholesterol is approximately Normal with mean 
fe = 170 milligrams of cholesterol per deciliter of blood (mg/dL) and standard devia- 
tion @ = 30 mg/dL.? What is the first quartile of the distribution of blood cholesterol? 


Step 1. State the problem and draw a picture. Call the cholesterol level x. The 
variable x has the N(170, 30) distribution. The first quartile is the value with 25% 
of the distribution to its left. Figure 3.14 is the picture. 


Step 2. Use the table. Look in the body of Table A for the entry closest to 0.25. 
It is 0.2514. This is the entry corresponding to z = —0.67. So z = —0.67 is the 
standardized value with area 0.25 to its left. 


Step 3. Unstandardize. The cholesterol level corresponding to z = —0.67 lies 
0.67 standard deviations below the mean, so 


xX = mean — (0.67)(standard deviation) 
= 170 — (0,67)(30) = 149.9 


The first quartile of blood cholesterol levels in 14-year-old boys is about 150 mg/dL. = 


Area = 0.25 


2 = —-0.67 
x=? 
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* The StatBoards video, Finding the Value Given a Proportion, provides 


the details of finding the percentiles (values given in a proportion) for a 
Normal distribution using the standard Normal curve. 


Summary 


93 


Apply Your Knowledge 


3.13 Table A. Use Table A to find the value z of a standard Normal variable that 
satisfies each of the following conditions. (Use the value of x from Table A 
that comes closest to satisfying the condition.) In each case, sketch a stan- 
dard Normal curve with your value of x marked on the axis. 

(a) The point z with 65% of the observations falling below it 
(b) The point z with 20% of the observations falling above it 


3.14 The Medical College Admissions Test. The total scores on the Medical 


College Admission Test (MCAT) in 2013 follow a Normal distribution with 
mean 25.3 and standard deviation 6.5. 


(a) What are the median and the first and third quartiles of the MCAT 
scores? What is the interquartile range? 
(b) Give the interval that contains the central 80% of the MCAT scores. 


CHAPTER 3 SUMMARY 


Chapter Specifics 


We can sometimes describe the overall pattern of a distribution by a density curve. A 
density curve has total area 1 underneath it. An area under a density curve gives the 
proportion of observations that fall in a range of values. 


A density curve is an idealized description of the overall pattern of a distribution that 
smooths out the irregularities in the actual data. We write the mean of a density curve 
as yz and the standard deviation of a density curve as o to distinguish them from the 
mean x and standard deviation s of the actual data. 


The mean, the median, and the quartiles of a density curve can be located by eye. The 
mean p is the balance point of the curve. The median divides the area under the curve 
in half. The quartiles and the median divide the area under the curve into quarters. 
The standard deviation o cannot be located by eye on most density curves. 


The mean and median are equal for symmetric density curves. The mean of a skewed 
curve is located farther toward the long tail than is the median. 


The Normal distributions are described by a special family of bell-shaped, symmetric 
density curves, called Normal curves. The mean pw and standard deviation @ com- 
pletely specify a Normal distribution N(, 7). The mean is the center of the curve, and 
a is the distance from pw to the change-of-curvature points on either side. 


To standardize any observation x, subtract the mean of the distribution and then divide 
by the standard deviation. The resulting z-score 
el 
z= 
o 


says how many standard deviations x lies from the distribution mean. 


All Normal distributions are the same when measurements are transformed to the 
standardized scale. In particular, all Normal distributions satisfy the 68-95-99.7 rule, 


1 THE BELL CURVE? 
, Does the distribution of 
human intelligence follow the “bell 
curve” of a Normal distribution? 
Scores on IQ tests do roughly 
follow a Normal distribution. That 
is because a test score is calculated 
from a person’s answers in a way that 
is designed to produce a Normal 
distribution. To conclude that 
intelligence follows a bell curve, 
we must agree that the test scores 
directly measure intelligence. Many 
psychologists don’t think there is one 
human characteristic that we can 
call “intelligence” and can measure 
by a single test score. 
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which describes what percent of observations lie within one, two, and three standard 
deviations of the mean, respectively. 


m Ifx has the N(w, o) distribution, then the standardized variable x = (x — )/o has 
the standard Normal distribution N(0, 1) with mean 0 and standard deviation 1. 
Table A gives the cumulative proportions of standard Normal observations that are 
less than x for many values of x. By standardizing, we can use Table A for any Normal 
distribution. 


Link It 


When exploring data, some data sets can be shown to closely follow the Normal distribu- 
tion. When this is true, the description of the data can be greatly simplified without much 
loss of information. We can calculate the percentage of the distribution in an interval for 
any Normal distribution if we know its mean and standard deviation. This also shows why 
the mean and standard deviation can be important numerical summaries. For distributions 
that are approximately Normal, these two numerical summaries give a complete description 
of the distribution of our data. It is important to remember that not all distributions can 
be well approximated by a Normal curve. In these cases, calculations based on the Normal 
distribution can be misleading. 

Normal distributions are also good approximations to many kinds of chance outcomes 
such as the proportion of heads in many tosses of a coin—this setting will be described in 
more detail in Chapter 22. And when we discuss statistical inference in Parts III and IV of 
the text, we will find that many procedures based on Normal distributions work well for 
other roughly symmetric distributions. 


() LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


° StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 


CHECK YOUR SKILLS 


Sal 


3.16 


3.18 


Which of these variables is most likely to have a Normal 
distribution? 


(a) Income per person for 150 different countries 
(b) Sale prices of 200 homes in Santa Barbara, CA 
(c) Lengths of 100 newborns in Connecticut 


To completely specify the shape of a Normal distribu- 
tion, you must give 


(a) the mean and the standard deviation. 

(b) the five-number summary. 

(c) the median and the quartiles. 

Figure 3.15 shows a Normal curve. The mean of this 
distribution is 


(a)0. = (b) 2. (c) 3. 


The standard deviation of the Normal distribution in 
Figure 3.15 is 


(b) 3. 


(a) 2. (c) 5. 
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FIGURE 3.15 
A Normal curve, for Exercises 3.17 and 3.18. 


CHAPTER 3 EXERCISES 


3.25 


3.26 


Understanding density curves. Remember that it 
is areas under a density curve, not the height of the 
curve, that give proportions in a distribution. To illus- 
trate this, sketch a density curve that has a tall, thin 
peak at 0 on the horizontal axis but has most of its 
area close to | on the horizontal axis without a high 
peak at 1. 


Daily activity. It appears that people who are mildly 
obese are less active than leaner people. One study 
looked at the average number of minutes per day that 
people spend standing or walking.!° Among mildly 
obese people, minutes of activity varied according to 
the N(373, 67) distribution. Minutes of activity for 
lean people had the N(526, 107) distribution. Within 
what limits do the active minutes for about 95% of the 
people in each group fall? Use the 68-95—99.7 rule. 


Sao 
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The length of human pregnancies from conception to 
birth varies according to a distribution that is approxi- 
mately Normal with mean 266 days and standard devia- 
tion 16 days. About 95% of all pregnancies last between 


(a) 250 and 282 days. (b) 234 and 298 days. 
(c) 218 and 314 days. 


The scores of adults on an IQ test are approximately 
Normal with mean 100 and standard deviation 15. 
The organization MENSA, which calls itself “the high 
IQ society,” requires an IQ score of 130 or higher for 
membership. What percent of adults would qualify for 
membership? 


(a) 95% (b) 5% (c) 2.5% 
The scores of adults on an IQ test are approximately 


Normal with mean 100 and standard deviation 15. 
Clara scores 132 on such a test. Her z-score is about 


(a) 2.13. (b) 2.80. (c) 8.47. 


The proportion of observations from a standard Normal 
distribution that take values greater than 1.45 is about 


(a) 0.9265. (b) 0.0735. (c) 0.0808. 


The proportion of observations from a standard Normal 
distribution that take values less than —1.25 is about 


(a) 0.1151. (b) 0.1056. (c) 0.8849, 


The scores of adults on an IQ test are approximately 
Normal with mean 100 and standard deviation 15. 
Clara scores 132 on sucha test. She scores higher than 
what percent of all adults? 


(a) About 10% (b) About 90% 
(c) About 98% 


Low IQ test scores. Scores on the Wechsler Adult 
Intelligence Scale (WAIS) are approximately Normal 
with mean 100 and standard deviation 15. People with 
WAIS scores below 70 are considered intellectually 
disabled when, for example, applying for Social Security 
disability benefits. According to the 68—95-99.7 rule, 
about what percent of adults are intellectually disabled 
by this criterion? 


Standard Normal drill. Use Table A to find the pro- 
portion of observations from a standard Normal dis- 
tribution that falls in each of the following regions. In 
each case, sketch a standard Normal curve and shade 
the area representing the region. 


@) es =2)E (lp) ¢ = =2,15 
(d) —2.15 < zx < 1.57 


(@) 2 = ILS7 
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Standard Normal drill. 


(a) Find the number z such that the proportion of 
observations that are less than x in a standard Normal 
distribution is 0.3. 

(b) Find the number z such that 35% of all observa- 
tions from a standard Normal distribution are greater 
than z. 


Fruit flies. The common fruit fly Drosophila melano- 
gaster is the most studied organism in genetic research 
because it is small, is easy to grow, and reproduces 
rapidly. The length of the thorax (where the wings 
and legs attach) in a population of male fruit flies is 
approximately Normal with mean 0.800 millimeters 
(mm) and standard deviation 0.078 mm. 
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(a) What proportion of flies have thorax length less 
than 0.6 mm? 

(b) What proportion have thorax length greater than 
0.9 mm? 

(c) What proportion have thorax length between 

0.6 mm and 0.9 mm? 


Acid rain? Emissions of sulfur dioxide by industry set 
off chemical changes in the atmosphere that result 
in “acid rain.” The acidity of liquids is measured by 
pH on a scale of 0 to 14. Distilled water has pH 7.0, 
and lower pH values indicate acidity. Normal rain is 
somewhat acidic, so acid rain is sometimes defined as 
rainfall with a pH below 5.0. The pH of rain at one lo- 
cation varies among rainy days according to a Normal 
distribution with mean 5.43 and standard deviation 
0.54. What proportion of rainy days have rainfall with 
pH below 5.0? 


Runners. In a study of exercise, a large group of male 
runners walk on a treadmill for 6 minutes. Their heart 


3.33 
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3.38 


rates in beats per minute at the end vary from runner 
to runner according to the N(104, 12.5) distribution. 
The heart rates for male nonrunners after the same 
exercise have the N(130, 17) distribution. 


(a) What percent of the runners have heart rates 
above 135? 

(b) What percent of the nonrunners have heart rates 
above 135? 


A milling machine. Automated manufacturing oper- 
ations are quite precise but still vary, often with distri- 
butions that are close to Normal. The width in inches 
of slots cut by a milling machine follows approximately 
the N(0.8750, 0.0012) distribution. The specifications 
allow slot widths between 0.8725 and 0.8775 inch. 
What proportion of slots meet these specifications? 


Body mass index. Your body mass index (BMI) is 
your weight in kilograms divided by the square of your 
height in meters. Many online BMI calculators allow 
you to enter weight in pounds and height in inches. 
High BMI is a common but controversial indicator of 
overweight or obesity. A study by the National Center 
for Health Statistics found that the BMI of American 
young men (ages 20 to 29) is approximately Normal 
with mean 26.8 and standard deviation 5.2."! 


(a) People with BMI less than 18.5 are often classi- 
fied as “underweight.” What percent of men aged 20 
to 29 are underweight by this criterion? 

(b) People with BMI over 30 are often classified as 
“obese.” What percent of men aged 20 to 29 are obese 
by this criterion? 

Miles per gallon. In its Fuel Economy Guide for 2014 
model vehicles, the Environmental Protection Agency 
gives data on 1160 vehicles. There are a number of high 
outliers, mainly hybrid gas—electric vehicles. If we ignore 
the vehicles identified as outliers, however, the combined 
city and highway gas mileage of the other 1134 vehicles 
is approximately Normal with mean 22.2 miles per gallon 
(mpg) and standard deviation 5.2 mpg. Exercises 3.35 to 
3.38 concern this distribution. 

I love my bug! The 2014 Volkswagen Beetle with a 
four-cylinder 1.8 L engine and automatic transmission 
has combined gas mileage of 28 mpg. What percent of 
all vehicles have better gas mileage than the Beetle? 
The top 5%. How high must a 2014 vehicle’s gas mile- 
age be to fall in the top 5% of all vehicles? 

The middle half. The quartiles of any distribution are 
the values with cumulative proportions 0.25 and 0.75. 
They span the middle half of the distribution. What 
are the quartiles of the distribution of gas mileage? 
Quintiles. The quintiles of any distribution are the 
values with cumulative proportions 0.20, 0.40, 0.60, 
and 0.80. What are the quintiles of the distribution of 
gas mileage? 
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What’s your percentile? Reports on a student’s test 
score such as the SAT or a child’s height or weight usu- 
ally give the percentile as well as the actual value of the 
variable. The percentile is just the cumulative propor- 
tion stated as a percent: the percent of all values of the 
variable that were lower than this one. The upper arm 
lengths of females in the United States are approximately 
Normal with mean 35.8 cm and standard deviation 
2.1 cm, and those for males are approximately Normal 
with mean 39.1 cm and standard deviation 2.3 cm. 


(a) Larry, a 60-year-old male in the United States, 
has an upper arm length of 37.2 cm. What is his 
percentile? 

(b) Measure your upper arm length to the nearest tenth 
of a centimeter, referring to Exercise 3.5 (page 84) for 
the measurement instructions. What is your arm length 
in centimeters? What is your percentile? 


Perfect SAT scores. It is possible to score higher 
than 1600 on the combined mathematics and read- 
ing portions of the SAT, but scores 1600 and above 
are reported as 1600. The distribution of SAT scores 
(combining mathematics and reading) in 2013 was 
close to Normal with mean 1011 and standard devia- 
tion 216. What proportion of SAT scores for these two 
parts were reported as 1600? (that is, what proportion 
of SAT scores were actually higher than 1600?) 


Heights of women. The heights of women aged 20 to 29 
follow approximately the N(64.2, 2.8) distribution. Men 
the same age have heights distributed as N(69.4, 3.0). 
What percent of women aged 20 to 29 are taller than the 
mean height of men aged 20 to 29? 


Weights aren’t Normal. The heights of people of the 
same sex and similar ages follow a Normal distribution 
reasonably closely. Weights, on the other hand, are 
not Normally distributed. The weights of women aged 
20 to 29 in the United States have mean 161.9 pounds 
and median 149.4 pounds. The first and third quartiles 
are 126.3 pounds and 181.2 pounds. What can you say 
about the shape of the weight distribution? Why? 


A surprising calculation. Changing the mean and 
standard deviation of a Normal distribution by a mod- 
erate amount can greatly change the percent of ob- 
servations in the tails. Suppose a college is looking for 
applicants with SAT math scores 750 and above. 


(a) In 2013, the scores of men on the math SAT 
followed the N(531, 121) distribution. What percent 
of men scored 750 or better? 

(b) Women’s SAT math scores that year had the 
N(499, 112) distribution. What percent of women 
scored 750 or better? You see that the percent of men 
above 750 is almost three times the percent of women 
with such high scores. Why this is true is contro- 
versial. (On the other hand, women score higher 
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than men on the new SAT writing test, though by a 
smaller amount.) 


Grading managers. Some companies “grade on a bell 
curve” to compare the performance of their managers 
and professional workers. This forces the use of some 
low performance ratings so that not all workers are 
listed as “above average.” Ford Motor Company’s “per- 
formance management process” for this year assigned 
10% A grades, 80% B grades, and 10% C grades to 
the company’s managers. Suppose Ford’s performance 
scores really are Normally distributed. This year, man- 
agers with scores less than 25 received C grades and 
those with scores above 475 received A grades. What 
are the mean and standard deviation of the scores? 


Osteoporosis. Osteoporosis is a condition in which 
the bones become brittle due to loss of minerals. To 
diagnose osteoporosis, an elaborate apparatus mea- 
sures bone mineral density (BMD). BMD is usually 
reported in standardized form. The standardization is 
based on a population of healthy young adults. The 
World Health Organization (WHO) criterion for os- 
teoporosis isa BMD 2.5 standard deviations below the 
mean for healthy young adults. BMD measurements in 
a population of people similar in age and sex roughly 
follow a Normal distribution. 


Solid 


bone matrix 


Weakened 
bone matrix 


© Nucleus Medical Art, Inc/Phototake 


(a) What percent of healthy young adults have osteo- 
porosis by the WHO criterion? 

(b) Women aged 70 to 79 are of course not young 
adults. The mean BMD in this age is about —2 on the 
standard scale for young adults. Suppose the stan- 
dard deviation is the same as for young adults. What 
percent of this older population have osteoporosis? 


In later chapters we will meet many statistical procedures 
that work well when the data are “close enough to Nor- 
mal.” Exercises 3.46 to 3.50 concern data that are mostly 
close enough to Normal for statistical work, whereas Exer- 
cise 3.51 concerns data for which the data are not close to 
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Normal. These exercises ask you to do data analysis and 
Normal calculations to investigate how close to Normal 
real data are. 


Normal is only approximate: IQ test scores. Here 
are the IQ test scores of 31 seventh-grade girls in a 
Midwest school district: !” MIDWSTIQ 


114 100 104 89 102 SL ie 
OOS OCS OZ CRS? Sei 2S 
11S 9 SOm 2 weal Ar Sul 
107 = 103 93 Ses IW sil 93 


(a) We expect IQ scores to be approximately Nor- 
mal. Make a stemplot to check that there are no 
major departures from Normality. 

(b) Nonetheless, proportions calculated from a Nor- 
mal distribution are not always very accurate for small 
numbers of observations. Find the mean x and standard 
deviation s for these IQ scores. What proportions of the 
scores are within one standard deviation and within two 
standard deviations of the mean? What would these 
proportions be in an exactly Normal distribution? 
Normal is only approximate: ACT scores. Composite 
scores on the ACT test for the 2013 high school gradu- 
ating class had mean 20.9 and standard deviation 5.4. 
In all, 1,799,243 students in this class took the test. Of 
these, 170,777 had scores higher than 28, and another 
56,351 had scores exactly 28. ACT scores are always 
whole numbers. The exactly Normal N(20.9, 5.4) dis- 
tribution can include any value, not just whole num- 
bers. What is more, there is no area exactly above 28 
under the smooth Normal curve. So ACT scores can be 
only approximately Normal. To illustrate this fact, find 


(a) the percent of 2013 ACT scores greater than 28 
using the actual counts reported. 

(b) the percent of 2013 ACT scores greater than or 
equal to 28, using the actual counts reported. 

(c) the percent of observations that are greater than 
28 using the N(20.9, 5.4) distribution. (The percent 
greater than or equal to 28 is the same, because there 
is no area exactly over 28.) 

Are the data Normal? Acidity of rainfall. Exercise 
3.31 concerns the acidity (measured by pH) of rain- 


fall. A sample of 105 rainwater specimens had mean 
pH 5.43, standard deviation 0.54, 


and five-number summary 4.33, 5.05, 722.4 792.2 
5.44, 5.79, 6.81.2 736.8 806.4 
(a) Compare the mean and median 866.2 869.4 
and also the distances of the two 877.6 803.8 
quartiles from the median. Does it 728.7 958.1 
appear that the distribution is quite (EL 133 
symmetric? Why? 1020.5 810.0 
(b) If the distribution is really 887.0 653.1 
N(5.43, 0.54), what proportion of 852.4, 735.6 
observations would be less than 784.7 785.0 
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861.3 
784.8 
823.5 
976.2 
868.6 
923.4 
858.1 
913.6 
955.9 
896.6 


5.05? Less than 5.79? Do these proportions suggest 
that the distribution is close to Normal? Why? 


Are the data Normal? SAT mathematics scores. Geor- 
gia Southern University (GSU) had 3219 students 
with regular admission in its freshman class of 2013. 
For each student, data is available on their SAT and 
ACT scores, if taken, high school GPA, and the col- 
lege within the university to which they were admit- 
ted.'* Here are the first 20 SAT mathematics scores 
from that data set: jy) saTMaTH 


650 490 580 450 570 540 510 530 
510 560 560 590 470 690 530 570 
460 590 530 490 


The complete data is in the file SATMATH, which con- 
tains both the original and ordered mathematics scores. 


(a) Make a histogram of the distribution (if your 
software allows it, superimpose a normal curve over 

the histogram as in Figure 3.1). Although the resulting 
histogram depends a bit on your choice of classes, the 
distribution appears roughly symmetric with no outliers. 
(b) Find the mean, median, standard deviation, and 
quartiles for these data. Comparing the mean and the 
median and comparing the distances of the two quar- 
tiles from the median suggest that the distribution is 
quite symmetric. Why? 

(c) In 2013, the mean score on the mathematics 
portion of the SAT for all college-bound seniors was 
514. If the distribution were exactly Normal with the 
mean and standard deviation you found in part (b), 
what proportion of GSU freshmen scored above the 
mean for all college-bound seniors? 

(d) Compute the exact proportion of GSU freshmen 
who scored above the mean for all college-bound 
seniors. It will be simplest to use the ordered scores in 
the SATMATH file to calculate this. How does this 
percentage compare with the percentage calculated 
in part (c)? Despite the discrepancy, this distribution 
is “close enough to Normal” for statistical work in 
later chapters. 

Are the data Normal? Monsoon rains. Here are the 
amounts of summer monsoon rainfall (millimeters) for 


India in the 100 years 1901 to 2000: MONSOON 


750.6 716.8 885.5 777.9 897.5 889.6 935.4 
Pike Msi Hii NCO IL, SISO 7D 
863.0 804.0 903.1 853.5 768.2 821.5 804.9 
913.8 843.9 908.7 842.4 908.6 789.9 853.6 
yaoi Liles SOC) RSS) HES) SO teh 7f3) 
885.8 930.5 983.6 789.0 889.6 944.3 839.9 
922.8 709.6 740.2 8603 754.8 831.3 940.0 
748.3 963.0 857.0 883.4 909.5 708.0 882.9 
836.9 760.0 743.2 697.4 961.7 866.9 908.8 
938.4 8264 857.3 870.5 873.8 827.0 770.2 
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Number in sample 
0 20 40 60 80 100 120 140 160 


(a) Make a histogram of these rainfall amounts. Find 
the mean and the median. 

(b) Although the distribution is reasonably Normal, 

your work shows some departure from Normality. In 
what way are the data not Normal? 


Are the data Normal? Weight of females in their 20s. 
Many body measurements of people of the same sex and 
similar ages such as height and upper arm length fol- 
low a Normal distribution reasonably closely. Weights, 
on the other hand, are not Normally distributed. The 
NHANES survey of 2009-10! includes the weights of 
a representative sample of 548 females in the United 
States aged 20 to 29. The mean of the weights was 
161.58 pounds, and the standard deviation was 48.96 
pounds. Figure 3.16 gives a histogram of the data along 
with a smooth curve representing an N(161.58, 48.96) 
distribution. From the figure, the Normal curve does 
not appear to follow the pattern in the histogram that 
closely. Because of this, the use of areas under the Nor- 
mal curve may not provide a good approximation to 
weights in various intervals. £j),) FEMWEIGH 


je te 


(a) Using the data file on the text website, what 
proportion of females aged 20 to 29 weighed under 


350 400 
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Weight in pounds 


200 


FIGURE 3.16 
Histogram of the weights of 548 females aged 20 to 29 in the 2009-10 
NHANES survey, with a Normal curve superimposed, for Exercise 3.51. 
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100 pounds? What percent of the N(161.58, 48.96) 
distribution is below 100? 

(b) What proportion of females aged 20 to 29 weighed 
over 250 pounds? What percent of the N(161.58, 48.96) 
distribution is above 250? 

(c) Based on your answers in parts (a) and (b), do 
you think it is a good idea to summarize the distribu- 
tion of weights by an N(161.58, 48.96) distribution? 


The Normal Density Curve applet allows you to do Nor- 
mal calculations quickly. It is somewhat limited by the 
number of pixels available for use, so that it can’t hit ev- 
ery value exactly. In the following exercises, use the closest 
available values. In each case, make a sketch of the curve 
from the applet marked with the values you used to answer 
the questions. 


How accurate is 68-95-99.7? The 68-95-99.7 rule 
for Normal distributions is a useful approximation. 
To see how accurate the rule is, drag one flag across 
the other so that the applet shows the area under the 
curve between the two flags. 


(a) Place the flags one standard deviation on either 
side of the mean. What is the area between these 
two values? What does the 68—95—99.7 rule say this 
area is? 

(b) Repeat for locations two and three standard devia- 
tions on either side of the mean. Again compare the 
68-95-99.7 rule with the area given by the applet. 


Where are the quartiles? How many standard devia- 
tions above and below the mean do the quartiles of 
any Normal distribution lie? (Use the standard Normal 
distribution to answer this question.) 


Grading managers. In Exercise 3.44, we saw that 
Ford Motor Company once graded its managers in 
such a way that the top 10% received an A grade, 
the bottom 10% a C, and the middle 80% a B. Let’s 
suppose that performance scores follow a Normal 
distribution. How many standard deviations above 
and below the mean do the A/B and B/C cutoffs lie? 
(Use the standard Normal distribution to answer this 
question.) 


3.55 Are the data Normal? Comparing quartiles. The website http: // 
research.collegeboard.org/programs/sat/data presents data for 
high school seniors who participated in the SAT Program from the current 
year as well as previous years. Click on the link for College Bound Seniors for 
the most recent year given. In the window that opens, click on the link for To- 
tal Group Report for this year. The Total Group Profile will open and contains 
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several tables, each giving different summary information. Go to the table for 
Overall Mean Scores. How many students took the Critical Reading portion of 
the SAT? What were the mean and standard deviation of the scores? Assuming 
the distribution of scores is Normal with the mean and standard deviation given 
in the Overall Mean Scores table, what would be the first and third quartiles of 
the distribution? Now go to the table for Percentiles for the Total Group, and compare 
the actual first and third quartiles from the data to the values obtained from the 
Normal curve. Does this give any evidence that the distribution of Critical Reading 
scores is not Normal? 


Are the data Normal? Comparing proportions. The website http: // 
research.collegeboard.org/programs/sat/data presents data for 
high school seniors who participated in the SAT Program from the current year 
and previous years. Click on the link for College Bound Seniors for the most recent 
year given. In the window that opens, click on the link for Total Group Report for 
this year. The Total Group Profile will open and contains several tables, each giv- 
ing different summary information. Go to the table for Overall Mean Scores. How 
many students took the Critical Reading portion of the SAT? What were the mean 
and standard deviation of the scores? Now go to the table on the Score Distribu- 
tion. The scores are broken into intervals 200-290, 300-390, and so on. Using the 
Total column, find the actual percentage of students who scored in each of the six 
intervals reported for the Critical Reading portion of the SAT: Now, assuming the 
distribution of scores is Normal with the mean and standard deviation given in the 
Overall Mean Scores table, find the area under the Normal curve for each interval. 
Because of the discreteness of the actual scores, take the interval 200-290 as the 
interval 200-300, the interval 300-390 as the interval 300-400, and so on, when 
finding the areas under the Normal curve. How do the actual percentages compare 
to the areas under the Normal curve? Does this give any evidence that the distribu- 
tion of Critical Reading scores is not Normal? 


© Reinhard Dirscherl/age fotostock 


medical study finds that short women are more likely to have heart attacks 

than women of average height, whereas tall women have the fewest heart 

attacks. An insurance group reports that heavier cars have fewer deaths per 
10,000 vehicles registered than do lighter cars. These and many other statistical 
studies look at the relationship between two variables. Statistical relationships are 
overall tendencies, not ironclad rules. They allow individual exceptions. Although 
smokers on the average die younger than nonsmokers, some people live to 90 while 
smoking three packs a day. 

To understand a statistical relationship between two variables, we measure both 
variables on the same individuals. Often, we must examine other variables as well. 
To conclude that shorter women have higher risk from heart attacks, for example, 
the researchers had to eliminate the effect of other variables such as weight and 
exercise habits. In this and the following chapter, we study relationships between 
variables. One of our main themes is that the relationship between two variables 
can be strongly influenced by other variables that are lurking in the background. 


We think that car weight helps explain accident deaths and that smoking influ- 
ences life expectancy. In each of these relationships, the two variables play different 
roles: one explains or influences the other. 


4.1 


4.2 


4.3 
4.4 


4.5 


4.6 


Explanatory and response 
variables 


Displaying relationships: 
scatterplots 


Interpreting scatterplots 


Adding categorical variables 
to scatterplots 


Measuring linear association: 
correlation 


Facts about correlation 
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AFTER YOU PLOT 


i 0 

dik YOUR DATA, THINK! 
The statistician Abraham Wald 
(1902-1950) worked on war problems 
during World War II. Wald invented 
some statistical methods that were 
military secrets until the war ended. 
Here is one of his simpler ideas. Asked 
where extra armor should be added to 
airplanes, Wald studied the location of 
enemy bullet holes in planes returning 
from combat. He plotted the locations 
on an outline of the plane. As data 
accumulated, most of the outline 
filled up. Put the armor in the few 
spots with no bullet holes, said Wald. 
That’s where bullets hit the planes 
that didn’t make it back. 
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Response Variable, Explanatory Variable 


A response variable measures an outcome of a study. An explanatory variable 
may explain or influence changes in a response variable. 


You will often find explanatory variables called independent variables or predictor 
variables and response variables called dependent variables. The idea behind this 
language is that the response variable depends on the explanatory variable. Because 
“independent” and “dependent” have other meanings in statistics that are unrelated 
to the explanatory—response distinction, we prefer to avoid those words. 

It is easiest to identify explanatory and response variables when we actually set 
values of one variable to see how it affects another variable. 


How does drinking beer affect the level of alcohol in our blood? The legal limit for 
driving in all states is 0.08%. Student volunteers at The Ohio State University drank 
different numbers of cans of beer. Thirty minutes later, a police officer measured their 
blood alcohol content. Number of beers consumed is the explanatory variable, and 
percent of alcohol in the blood is the response variable. = 


When we don’t set the values of either variable but just observe both variables, 
there may or may not be explanatory and response variables. Whether there are 
depends on how we plan to use the data. 


A college student aid officer looks at the findings of the National Student Loan Survey. 
She notes data on the amount of debt of recent graduates, their current income, and 
how stressed they feel about college debt. She isn’t interested in predictions but is 
simply trying to understand the situation of recent college graduates. The distinction 
between explanatory and response variables does not apply. 

A sociologist looks at the same data with an eye to using amount of debt and income, 
along with other variables, to explain the stress caused by college debt. Now amount of 
debt and income are explanatory variables, and stress level is the response variable. m 


In many studies, the goal is to show that changes in one or more explanatory 
variables actually cause changes in a response variable. Other explanatory—response 
relationships do not involve direct causation. Nations with more television sets per 
person have greater life expectancies, but shipping many television sets to Botswana 
won’t cause life expectancy to increase. 

Most statistical studies examine data on more than one variable. Fortunately, 
statistical analysis of several-variable data builds on the tools we used to examine 
individual variables. The principles that guide our work also remain the same: 


Plot your data. Look for overall patterns and deviations from those patterns. 


Based on what your plot shows, choose numerical summaries for some aspects 
of the data. 


4.2 Displaying Relationships: Scatterplots 103 


Apply Your Knowledge 


4.1 Explanatory and Response Variables? You have data on a large group 
of college students. Here are four pairs of variables measured on these stu- 
dents. For each pair, is it more reasonable to simply explore the relationship 
between the two variables, or to view one of the variables as an explanatory 
variable and the other as a response variable? In the latter case, which is the 
explanatory variable, and which is the response variable? 


(a) Number of lectures attended in your statistics course and grade on the 
final exam for the course 


(b) Number of hours per week spent exercising and calories burned per 
week 


(c) Hours per week spent online using Facebook and grade point average 


(d) Hours per week spent online using Facebook and IQ 


4.2 Coral Reefs. How sensitive to changes in water temperature 
are coral reefs? To find out, scientists examined data on sea sur- 
face temperatures and coral growth per year at locations in the 
Red Sea.! What are the explanatory and response variables? 
Are they categorical or quantitative? 


4.3. Beer and Blood Alcohol. Example 4.1 describes a study in 
which college students drank different amounts of beer. The 
response variable was their blood alcohol content (BAC). 
BAC for the same amount of beer might depend on other 
facts about the students. Name two other variables that could 
influence BAC. 


© Georgie Holland/age fotostock 


The most useful graph for displaying the relationship between two quantitative 
variables is a scatterplot. 


Who takes a test may infuence the average score. Let’s follow our four-step process 


Figure 1.8 (page 27) reminded us that in some states most high school graduates 
take the SAT to test their readiness for college, and in other states most take the ACT. 
(page 63) to examine this influence.” ely 


(STATE) The percent of high school students who take the SAT varies from state 
to state. Does this fact help explain differences among the states in average SAT = MATHSAT 
Mathematics score? 


“PLAN: Examine the relationship between percent taking the SAT and state mean 
score on the Mathematics part of the SAT. Choose the explanatory and response vari- 
ables. Make a scatterplot to display the relationship between the variables. Interpret 
the plot to understand the relationship. 


BOR EIGELoRuce sen We suspect that “percent taking” will help explain “mean 


score.” So “percent taking” is the explanatory variable, and “mean score” is the 
response variable. We want to see how mean score changes when percent taking 
changes, so we put percent taking (the explanatory variable) on the horizontal axis. 
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Figure 4.1 is the scatterplot. Each point represents a single state. In Colorado, for 
example, 14% took the SAT, and their mean SAT Math score was 581. Find 14 on 
the x (horizontal) axis and 581 on the y (vertical) axis. Colorado appears as the point 
(14,581) above 14 and to the right of 581. 


ESSER We will explore conclusions in Example 4.4. = 


FIGURE 4.1 

Scatterplot of the mean SAT Math- 
ematics score in each state against 

the percent of that state's high school 
graduates who take the SAT, for 
Example 4.3. The dotted lines intersect 
at the point (14, 581), the data point 
for Colorado. 
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Percent of graduates taking the SAT 


Scatterplot 


A scatterplot shows the relationship between two quantitative variables measured 
on the same individuals. The values of one variable appear on the horizontal axis, and 
the values of the other variable appear on the vertical axis. Each individual in the data 
appears as the point in the plot fixed by the values of both variables for that individual. 


Always plot the explanatory variable, if there is one, on the horizontal axis (the x 
axis) of a scatterplot. As a reminder, we usually call the explanatory variable x and 
the response variable y. If there is no explanatory-response distinction, either variable 
can go on the horizontal axis. 


(9) LaunchPad Online Resources 


* The StatBoards video, Creating and Interpreting Scatterplots, provides details 


of making a scatterplot through an example. 


Apply Your Knowledge 


4.4 Death by Intent. Homicide and suicide are both intentional means of 


ending a life. However, the reason for committing a homicide is different 
from that for suicide, and we might expect homicide and suicide rates to 
be uncorrelated. On the other hand, both can involve some degree of vio- 
lence, so perhaps we might expect some level of correlation in the rates. 
Here are data from 2008-10 for 26 counties in Ohio.’ Rates are per 100,000 


people. ely) DEATH 


4.3 Interpreting Scatterplots [1 O05 


Homicide Suicide Homicide Suicide 

County Rate Rate County Rate Rate 
Allen 4.2 9.5 Lorain 3.1 11.0 
Ashtabula 1.8 15.5 Lucas 7.4 13.3 
Butler 2.6 2A Mahoning 10.9 12.4 
Clermont 1.0 16.0 Medina 0.5 10.0 
Clark 5.6 14.5 Miami 2.6 9.2 
Columbiana 3.5 16.6 Montgomery 9.5 15.2 
Cuyahoga 9.2 95 Portage 1.6 9.6 
Delaware 0.8 7.6 Stark 4.7 13.5 
Franklin 8.7 11.4 Summit 4.9 11.5 
Greene Zl 12.8 Trumbull 5.8 16.6 
Hamilton 8.9 10.8 Warren 0.7 11.3 
Lake 1.8 11.3 Wayne 1.8 8.9 
Licking 4.5 12.9 Wood 1.0 7.4 

Make a scatterplot to examine whether homicide and suicide rates are cor- 

related. For these data we are simply interested in exploring the relationship & 

between the two variables, so neither variable is an obvious choice for the 


explanatory variable. For convenience, use homicide rate as the explanatory 
variable and suicide rate as the response. (The Two-Variable Statistical Calcu- 
lator Applet provides an easy way to make scatterplots. Click “Data” to enter 
your data, then “Scatterplot” to see the plot.) 


4.5 Outsourcing by Airlines. Airlines have increasingly outsourced the main- 
tenance of their planes to other companies. A concern voiced by critics is 
that the maintenance may be less carefully done, so that outsourcing cre- 
ates a safety hazard. In addition, flight delays are often due to maintenance 
problems, so one might look at government data on percent of major mainte- 
nance outsourced and percent of flight delays blamed on the airline to deter- 
mine if these concerns are justified. This was done, and data from 2005 and 
2006 appeared to justify the concerns of the critics. Do more recent data still 
support the concerns of the critics? Here are data from 2012:4 AIRLINE 


Outsource Delay Outsource Delay 
Airline Percent Percent Airline Percent Percent 
Alaska 52.1 10.51 JetBlue 66.3 19.60 
American 25st 20.41 Southwest 60.8 19.58 
Delta 41.3 12.46 United 47.3 23.17 
Frontier 0.1 19.80 US Airways 56.5 10.80 
Hawaiian 79.1 4.85 


Make a scatterplot that shows the relation between delays and outsourcing. 


To interpret a scatterplot, adapt the strategies of data analysis learned in Chapters 1 
and 2 to the new two-variable setting. Describe the overall pattern of the plot by 
its direction, form, and strength. Direction indicates whether the overall pattern 
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MATHSAT 


clusters 


moves from lower left to upper right, from upper right to lower left, or neither. Form 
refers to the approximate functional form. For example, is it roughly a straight 
line, is it curved, or does it oscillate in some way? Strength refers to how closely the 
points in the plot follow the form. If they fall almost perfectly on a straight line, we 
say there is a strong straight line relationship. If they are widely scattered around a 
straight line, we say the relationship is weak. 


Examining a Scatterplot 


In any graph of data, look for the overall pattern and for striking deviations from 
that pattern. 


You can describe the overall pattern of a scatterplot by the direction, form, and 
strength of the relationship. 


An important kind of deviation is an outlier, an individual value that falls outside the 
overall pattern of the relationship. 


Be careful not to confuse ways we describe patterns for distributions of a 
single variable, such as symmetric or skewed, with ways we describe patterns in 
scatterplots. 


We continue to explore the state SAT Mathematics scores by interpreting what the 
scatterplot tells us about the variation in scores from state to state. 


Figure 4.1 shows a clear direction: the overall pattern 
moves from upper left to lower right. That is, states in which higher percents of high 
school graduates take the SAT tend to have lower mean SAT Mathematics scores. We 
call this a negative association between the two variables. 

The form of the relationship is roughly a straight line with a slight curve to the right 
as it moves down. What is more, most states fall into two distinct clusters. As in the 
histogram in Figure 1.8, the ACT states cluster at the left and the SAT states at the 
right. In 23 states, fewer than 20% of seniors took the SAT; in another 26 states, 
more than 45% took the SAT. 

The strength of a relationship in a scatterplot is determined by how closely the 
points follow a clear form. The overall relationship in Figure 4.1 is moderately strong: 
states with similar percents taking the SAT tend to have roughly similar mean SAT 
Math scores. 


SOMES Percent taking explains much of the variation among states in aver- 
age SAT Mathematics score. States in which a higher percent of students take the 
SAT tend to have lower mean scores because the mean includes a broader group of 
students. SAT states as a group have lower mean SAT scores than ACT states. So 
average SAT score says almost nothing about the quality of education in a state. It is 
foolish to “rank” states by their average SAT scores. m 


When discussing the direction of the relationship between two variables, we will 
use the word association. Association and relationship are often treated as synonyms 
by statisticians. 


Positive Association, Negative Association 


Two variables are positively associated when above-average values of one tend to 
accompany above-average values of the other, and below-average values also tend 


to occur together. 


Two variables are negatively associated when above-average values of one tend to 


accompany below-average values of the other, and vice versa. 


Of course, not all relationships have a clear direction that we can describe as 
positive association or negative association. Exercise 4.8 gives an example that does 
not have a single direction. Here is an example of a strong positive association with 


a simple and important form. 


Interpreting Scatterplots 
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Douglas Faulkner/Science Source 


(STATE? Manatees are large, gentle, slow-moving creatures found along the coast 
of Florida. Many manatees are injured or killed by boats. Table 4.1 contains data 
on the number of boats registered in Florida (in thousands) and the number of 
manatees killed by boats for the years between 1977 and 2013.° Examine the 
relationship. Is it plausible that restricting the number of boats would help protect 
manatees? 


of the relationship. 


PI Make a scatterplot with “boats registered” as the explanatory variable and 
“manatees killed” as the response variable. Describe the form, direction, and strength 


YEAR BOATS MANATEES YEAR BOATS MANATEES YEAR BOATS MANATEES 
1977 447 13 1990 719 47 2003 978 73 
1978 460 Dil IO9il 681 533) 2004 983 69 
1979 481 24 1992 679 38 2005 1010 79 
1980 498 16 1993 678 35) 2006 1024 OD; 
1981 513 24 1994 696 49 2007 1027 73 
1982 5? 20 1995 713 Ay 2008 1010 90 
1983 526 15 1996 732 60 2009 982 97 
1984 559 34 1997 755 54 2010 942 83 
1985 585 33 1998 809 66 2011 922 88 
1986 614 38) 1999 830 82 2012 902 81 
1987 645 39 2000 880 78 2013 897 72 
1988 675 43 2001 Mat 81 

1989 711 50 2002 962 95 


ESRI Figure 4.2 is the scatterplot. There is a positive association—more boats 
goes with more manatees killed. This form is a linear relationship. That is, the 
overall pattern follows a straight line from lower left to upper right. The relationship 
is strong because the points don’t deviate greatly from a line. 


linear relationship 
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FIGURE 4.2 

Scatterplot of the number of 
Florida manatees killed by boats in 
the years 1977 to 2013 against 
the number of boats registered in 
Florida that year, for Example 4.5. 
There is a strong linear (straight- 
line) pattern. 
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Boats registered in Florida (thousands) 


[SSSR As more boats are registered, the number of manatees killed by boats 
goes up linearly. Data from the Florida Wildlife Commission indicate that in 2013 
boats accounted for 9% of manatee deaths and 13% of deaths whose causes could 


be determined. Although many manatees die from other causes, it appears that fewer 
boats would mean fewer manatee deaths. m 


other variables lurking in the background might contribute to the relationship 

displayed in a scatterplot. Because both boats registered and manatees killed 
are recorded year by year, any change in conditions over time might affect the rela- 
tionship. For example, if boats in Florida have tended to go faster over the years, 
that might result in more manatees killed by the same number of boats. 


A As the following chapter will emphasize, it is wise to always ask what 


(9) LaunchPad Online Resources 


* The Snapshots video, Correlation and Causation, discusses scatterplots, 
correlation, and their interpretation. 


* The StatBoards video, Creating and Interpreting Scatterplots, provides details 
of interpreting a scatterplot through an example. 


Apply Your Knowledge 


4.6 Death by Intent. Describe the direction, form, and strength of the relation- 
ship between homicide rate and suicide rate, as displayed in your plot for 
Exercise 4.4. Are there any deviations from the overall pattern? DEATH 


4.7 Outsourcing by Airlines. Does your plot for Exercise 4.5 show a positive, 
negative, or no association between maintenance outsourcing and delays 
caused by the airline? If it shows association, is the relationship very strong? 
Are there any outliers? El) AIRLINE 


4.4 Adding Categorical Variables to Scatterplots 


4.8 Does Fast Driving Waste Fuel? How does the fuel consumption of a car 
change as its speed increases? Here are data for a British Ford Escort. Speed is 
measured in kilometers per hour, and fuel consumption is measured in liters 
of gasoline used per 100 kilometers traveled.® FASTDR 


Speed 10 20 30 40 50 60 70 80 
Fuel 21.00 13.00 10.00 800 7.00 5.90 630 6.95 


Speed 90 100 110 120 130 140 150 
Fuel eo 8.27 BOY Scie IMO) er ss 


(a) Make a scatterplot. (Which is the explanatory variable?) 


(b) Describe the form of the relationship. It is not linear. Explain why the 
form of the relationship makes sense. 


(c) It does not make sense to describe the variables as either positively 
associated or negatively associated. Why? 


(d) Is the relationship reasonably strong or quite weak? Explain your answer. 


The U.S. Census Bureau groups the states into four broad regions, named 
Midwest, Northeast, South, and West. We might ask about regional patterns in 
SAT exam scores. Figure 4.3 repeats part of Figure 4.1, with an important differ- 
ence: we have plotted only the Midwest and Northeast groups of states, using circles 
for the Midwest states and plus signs for the Northeast states. 
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The regional comparison is striking. The 9 Northeast states are all SAT states— 
at least 61% of high school graduates in each of these states take the SAT. The 
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FIGURE 4.3 

Mean SAT Mathematics score and 
percent of high school graduates who 
take the test for only the Midwest (0) 
and Northeast (+) states. 
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13 Midwest states are mostly ACT states. In 11 of these states, fewer than 7% of high 
school graduates take the SAT. One Midwest state is clearly an outlier within the 
region: Indiana is an SAT state (70% take the SAT) that falls close to the Northeast 
cluster. Ohio, where 17% take the SAT, also lies outside the Midwest cluster. 

Dividing the states into regions introduces a third variable into the scatterplot. 
“Region” is a categorical variable that has four values, although we plotted data 
from only two of the four regions. The two regions are identified by the two differ- 
ent plotting symbols. 


Categorical Variables in Scatterplots 


To add a categorical variable to a scatterplot, use a different plot color or symbol for 
each category. 


(3) LaunchPad Online Resources 


* The Video Technology Manuals discuss how to add categorical variables to 
a scatterplot using software for several statistical software packages. 


Apply Your Knowledge 


4.9 Death by Intent The data described in Exercise 4.4 also indicated that the 
homicide rates for some counties should be treated with caution because of 
low counts: ely) DEATH2 


Homicide Suicide Homicide Suicide 
County Rate Rate Caution| County Rate Rate Caution 
Allen 4.2 9.2 Y Lorain Ball 11.0 We 
Ashtabula 1.8 15.5 Y. Lucas 7.4 13.3 N 
Butler 2.6 IVA Y¢ Mahoning 10.9 12.4 N 
Clermont 1.0 16.0 Y Medina 0.5 10.0 Y 
Clark 5.6 14.5 N Miami 2.6 9.2 Y 
Columbiana 35 16.6 N Montgomery 9.5 15.2 N 
Cuyahoga 9.2 9.5 N Portage 1.6 9.6 Y 
Delaware 0.8 7.6 Y Stark 4.7 13.5 N 
Franklin 8.7 11.4 N Summit 4.9 15) N 
Greene 2.7 12.8 Y Trumbull 5.8 16.6 N 
Hamilton 8.9 10.8 N Warren Ont eS ny; 
Lake 1.8 11.3 Y Wayne 1.8 8.9 Y 
Licking 4.5 12.9 N Wood 1.0 7.4 Y 


(a) Make a scatterplot of homicide rate versus suicide rate for all 26 counties. 
Use separate symbols to distinguish counties for which the homicide rate 
is to be treated with caution from those for which there is no caution. 


(b) Does the same overall pattern hold for both types of counties? What is 
the most important difference between the two types of counties? 
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A scatterplot displays the direction, form, and strength of the relationship 
between two quantitative variables. Linear (straight-line) relations are particularly 
important because a straight line is a simple pattern that is quite common. A linear 
relation is strong if the points lie close to a straight line, and weak if they are widely 
scattered about a line. Our eyes are not good judges of how strong a linear relation- 
ship is. The two scatterplots in Figure 4.4 depict exactly the same data, but the lower 
plot is drawn smaller in a large field. The lower plot seems to show a stronger linear 
relationship. Our eyes can be fooled by changing the plotting scales or the amount 
of space around the cloud of points in a scatterplot.’ We need to follow our strategy 
for data analysis by using a numerical measure to supplement the graph. Correlation 
is the measure we use. 
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Correlation 


The correlation measures the direction and strength of the linear relationship be- 
tween two quantitative variables. Correlation is usually written as r. 


Suppose that we have data on variables x and y for n individuals. The values for the 
first individual are x, and y,, the values for the second individual are x, and y, and 


FIGURE 4.4 

Two scatterplots of the same data. 
The straight-line pattern in the lower 
plot appears stronger because of the 
surrounding space. 


ac 
aH} DEATH FROM 
wii SUPERSTITION? 


Is there a relationship between 
superstitious beliefs and bad things 
happening? Apparently there is. 
Chinese and Japanese people think 
that the number 4 is unlucky because 
when pronounced it sounds like 

the word for “death.” Sociologists 
looked at 15 years’ worth of death 
certificates for Chinese and Japanese 
Americans and for white Americans. 
Deaths from heart disease were 
notably higher on the fourth day 

of the month among Chinese and 
Japanese but not among whites. The 
sociologists think the explanation is 
increased stress on “unlucky days.” 
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so on. The means and standard deviations of the two variables are x and s, for the 
x-values, and y and s, for the y-values. The correlation r between x and y is 


Sa a a a | 


or, more compactly, 


The formula for the correlation r is a bit complex. It helps us see what correlation 
is, but in practice you should use software or a calculator that finds r from keyed-in 
values of two variables, x and y. Exercise 4.10 asks you to calculate a correlation 
step-by-step from the definition to solidify its meaning. 

The formula for r begins by standardizing the observations. Suppose, for example, 
that x is height in centimeters and y is weight in kilograms and that we have height 
and weight measurements for n people. Then x and s, are the mean and standard 
deviation of the n heights, both in centimeters. The value 


Xp X 


Ss 


is the standardized height of the ith person, from Chapter 3. The standardized 
height says how many standard deviations above or below the mean a person’s 
height lies. Standardized values have no units—in this example, they are no longer 
measured in centimeters. Standardize the weights also. The correlation r is an aver- 
age of the products of the standardized height and the standardized weight for all 
the individuals. Just as in the case of the standard deviation s, the “average” here 
divides by one fewer than the number of individuals. 


() LaunchPad Resources 


° The StatBoards video, Computing a Correlation, provides details of calculating 
a correlation through an example. 


* The Video Technology Manuals discuss how to compute the correlation 
using software for several statistical software packages. 


Apply Your Knowledge 


4.10 Coral Reefs. Exercise 4.2 discusses a study in which scientists examined 
data on mean sea surface temperatures (in degrees Celsius) and mean coral 
growth (in millimeters per year) over a several-year period at locations in the 
Red Sea. Here are the data:§ ely) CORAL 


Sea Surface Temperature 29.68 29.87 30.16 30.22 30.48 30.65 30.90 
Growth 2.63 2.58 2.68 2.60 2.48 2.38 2.26 


(a) Make a scatterplot. Which is the explanatory variable? The plot shows 
a negative linear pattern. 
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(b) Find the correlation r step by step. You may wish to round off to two 
decimal places in each step. First, find the mean and standard devia- 
tion of each variable. Then, find the seven standardized values for 
each variable. Finally, use the formula for r. Explain how your value for 
ry matches your graph in part (a). 

(c) Enter these data into your calculator or software, and use the correla- 
tion function to find r. Check that you get the same result as in part 
(b), up to roundoff error. 


The formula for correlation helps us see that r is positive when there is a positive 
association between the variables. Height and weight, for example, have a positive 
association. People who are above average in height tend also to be above average 
in weight. Both the standardized height and the standardized weight are positive. 
People who are below average in height tend also to have below-average weight. 
Then both standardized height and standardized weight are negative. In both cases, 
the products in the formula for r are mostly positive, and so r is positive. In the same 
way, we can see that r is negative when the association between x and y is negative. 
More detailed study of the formula gives more detailed properties of r. Here is what 
you need to know to interpret correlation. 


4. Correlation makes no distinction between explanatory and response variables. It 
makes no difference which variable you call x and which you call y in calculat- 
ing the correlation. 


2. Because r uses the standardized values of the observations, r does not change 
when we change the units of measurement of x, y, or both. Measuring height in 
inches rather than centimeters and weight in pounds rather than kilograms 
does not change the correlation between height and weight. The correlation r 
itself has no unit of measurement; it is just a number. 


3. Positive r indicates positive association between the variables, and negative r indicates 
negative association. 


4. The correlation r is always a number between —1 and 1. Values of r near O in- 
dicate a very weak linear relationship. The strength of the linear relationship 
increases as r moves away from O toward either —1 or 1. Values of r close to 
—1 or | indicate that the points in a scatterplot lie close to a straight line. The 
extreme values r = —1 and r = 1 occur only in the case of a perfect linear 
relationship, when the points lie exactly along a straight line. 


The scatterplots in Figure 4.5 illustrate how values of r closer to 1 or —1 correspond 
to stronger linear relationships. To make the meaning of r clearer, the standard devia- 
tions of both variables in these plots are equal, and the horizontal and vertical scales 
are the same. In general, it is not so easy to guess the value of r from the appearance 
of a scatterplot. Remember that changing the plotting scales in a scatterplot may 
mislead our eyes, but it does not change the correlation. 
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FIGURE 4.5 


How correlation measures the strength of a linear relationship, for Example 4.6. Patterns closer to a straight 
line have correlations closer to 1 or —1. 


The scatterplots in Figure 4.6 show four sets of real data. The patterns are less 
regular than those in Figure 4.5, but they also illustrate how correlation measures the 
strength of linear relationships.° 


(a) This repeats the manatee plot in Figure 4.2. There is a strong positive linear rela- 
tionship, r = 0.954. 


(b) Here are the number of named tropical storms each year between 1984 and 
2013 plotted against the number predicted before the start of hurricane season 


arenas by William Gray of Colorado State University. There is a moderate linear relation- 


ship, r = 0.609. 


(c) These data come from an experiment that studied how quickly cuts in the limbs 
of newts heal. Each point represents the healing rate in micrometers (millionths 
of a meter) per hour for the two front limbs of the same newt. This relationship 
is weaker than those in parts (a) and (b), with r = 0.358. 


(d) Does last year’s stock market performance help predict how stocks will do this 
year? No. The correlation between last year’s percent return and this year’s per- 
cent return over 56 years is only r = —0.081. The scatterplot shows a cloud of 
points with no visible linear pattern. m 
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FIGURE 4.6 
How correlation measures the strength of a linear relationship, for Example 4.6. Four sets of real data with 
(a) r= 0.954, (b) r= 0.609, (c) r= 0.358, and (d) r= —0.081. 
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Last year’s percent return 


(d) 


Describing the relationship between two variables is a more complex task than 


describing the distribution of one variable. Here are some more facts about correla- 
tion, cautions to keep in mind when you use 7: 


Correlation requires that both variables be quantitative, so that it makes 
sense to do the arithmetic indicated by the formula for r. We cannot calcu- 
late a correlation between the incomes of a group of people and what 


city they live in, because city is a categorical variable. 
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tween two variables. Correlation does not describe curved relationships 
between variables, no matter how strong they are. Exercise 4.13 illustrates 
this important fact. 


3. A Like the mean and standard deviation, the correlation is not resistant: r is 


2. A Correlation measures the strength of only the linear relationship be- 


strongly affected by a few outlying observations. Use r with caution when 
outliers appear in the scatterplot. Reporting the correlation both with 
the outliers included and with the outliers removed is informative. 


4. Correlation is not a complete summary of two-variable data, even when the 
relationship between the variables is linear. You should give the means 
and standard deviations of both x and y along with the correlation. 


Because the formula for correlation uses the means and standard deviations, 
these measures are the proper choice to accompany a correlation. Here is an exam- 
ple in which understanding requires both means and correlation. 


One website recommends that fans of the television show American Idol score con- 
testants at home on a scale from 1 to 10, with higher scores indicating a better per- 
formance. Two friends, Angela and Elizabeth, decide to follow this advice and score 
contestants over the course of a season. How well do they agree? We calculate that 
the correlation between their scores is r = 0.9, suggesting that they agree. But the 
mean of Angela's scores is 0.8 point lower than Elizabeth’s mean. Does this suggest 
that the two friends disagree? 

These facts do not contradict each other. They are simply different kinds of infor- 
mation. The mean scores show that Angela awards lower scores than Elizabeth. 
But because Angela gives every contestant a score about 0.8 point lower than 
Elizabeth, the correlation remains high. Adding the same number to all values of 
either x or y does not change the correlation. Angela and Elizabeth actually score 
consistently because they agree on which performances are better. The high r 
shows their agreement. m 


Of course, even giving means, standard deviations, and the correlation for state 
SAT scores and percent taking will not point out the clusters in Figure 4.1. Numerical 
summaries complement plots of data, but they don’t replace them. 


Apply Your Knowledge 


4.11 Changing the Units. The sea surface temperatures in Exercise 4.10 are 
measured in degrees Celsius and growth is in millimeters per year. The cor- 
relation between sea surface temperature and coral growth is r = —0.8636. 
If the measurements were made in degrees Fahrenheit and inches per year, 
would the correlation change? Explain your answer. 


4.12 Changing the Correlation. Use your calculator, software, or the Two-Variable 
Statistical Calculator applet to demonstrate how outliers can affect correlation. 


(a) What is the correlation between homicide rate and suicide rate for the 
26 counties in Exercise 4.4? £jy)) DeaTH 


(b) Make a scatterplot of the data with one new point added. Point A: 
homicide rate 30, suicide rate 30. Find the new correlation for the 
original data plus Point A. 


(c) Make a scatterplot of the data with a different new point added. Point 
B: homicide rate 14, suicide rate 6.5. Find the new correlation for the 
original data plus Point B. DEATH3 


(d) By looking at your plot, explain why adding Point A makes the correla- 
tion stronger (closer to 1) and adding Point B makes the correlation 
weaker (closer to 0). 


4.13 Strong Association But No Correlation. The gas mileage of an automo- 
bile first increases and then decreases as the speed increases. Suppose this 
relationship is very regular, as shown by the following data on speed (miles 
per hour) and mileage (miles per gallon): £j),) Mee 


Speed 30 40:«50-—s«60-—s 70 
Mileage 24 28 30 28 24 


Make a scatterplot of mileage versus speed. Show that the correlation be- 
tween speed and mileage is r = 0. Explain why the correlation is 0 even 
though there is a strong relationship between speed and mileage. 


CHAPTER 4 SUMMARY 


Chapter Specifics 


To study relationships between variables, we must measure the variables on the same 
group of individuals. 


If we think that a variable x may explain or even cause changes in another variable y, 
we call x an explanatory variable and y a response variable. 


A scatterplot displays the relationship between two quantitative variables measured on 
the same individuals. Mark values of one variable on the horizontal axis (x axis) and 
values of the other variable on the vertical axis (y axis). Plot each individual’s data as 
a point on the graph. Always plot the explanatory variable, if there is one, on the x axis 
of a scatterplot. 


Plot points with different colors or symbols to see the effect of a categorical variable in 
a scatterplot. 


In examining a scatterplot, look for an overall pattern showing the direction, form, and 
strength of the relationship and then for outliers or other deviations from this pattern. 


Direction: If the relationship has a clear direction, we speak of either positive associa- 
tion (high values of the two variables tend to occur together) or negative association 
(high values of one variable tend to occur with low values of the other variable). 


Form: Linear relationships, where the points show a straight-line pattern, are an 
important form of relationship between two variables. Curved relationships and clus- 
ters are other forms to watch for. 


Strength: The strength of a relationship is determined by how close the points in the 
scatterplot lie to a simple form such as a line. 


The correlation r measures the direction and strength of the linear association 
between two quantitative variables x and y. Although you can calculate a correlation 
for any scatterplot, r measures only straight-line relationships. 


Summary 


17 
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m Correlation indicates the direction of a linear relationship by its sign: r > O for a 
positive association and r < 0 for a negative association. Correlation always satisfies 
—1 =r S1 and indicates the strength of a relationship by how close it is to —1 or 1. 
Perfect correlation, r = +1, occurs only when the points on a scatterplot lie exactly 
on a straight line. 


m Correlation ignores the distinction between explanatory and response variables. The 
value of r is not affected by changes in the unit of measurement of either variable. 
Correlation is not resistant, so outliers can greatly change the value of r. 


Link It 


In Chapters | to 3, we focused on exploring features of a single variable. In this chapter, we 
continued our study of exploratory data analysis but for the purpose of examining relation- 
ships between variables. A useful tool for exploring the relationship between two variables 
is the scatterplot. When the relationship is linear, correlation is a numerical measure of the 
strength of the linear relationship. 

It is tempting to assume that the patterns we observe in our data hold for values of our 
variables that we have not observed—in other words, that additional data would continue 
to conform to these patterns. The process of identifying underlying patterns would seem 
to assume that this is the case. But is this assumption justified? Parts II to V of the book 
answer this question. 


() LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


° StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 
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4.14 Researchers collect data on 5,134 American adults 4.18 What are all the values that a correlation r can pos- 
younger than 60. They measure the reaction times (in sibly take? 
seconds) of each subject to a stimulus on a computer (a) r=0 (b) O=r=1 (Cray = 1 
screen and how many years later the subject died.!° ; ; 
Se eae crate ater ee Spee He ea 4.19 If the correlation between two variables is close to 0, 
nae you can conclude that a scatterplot would show 
time can predict time to death (in years). When you 
make a scatterplot, the explanatory variable on the x axis (a) a strong straight-line pattern. 
(Ceierenctonteae (b) a cloud of points with no visible pattern. 
(ej ithoeme ae (c) no straight-line pattern, but there might be a strong 
2 f her form. 
(c) can be either reaction time or time to death. ee once, ne 
4.20 i i i 
4.15 The researchers in Exercise 4.14 found that people Bee One eames re ce ere a aes 
line. The correlation between x and y is close to 
with slower reaction times tended to die sooner. In 
a scatterplot of the reaction time and the number of (a) —1. (b) 1. (©) either —1 or 1, we can’t say which. 
years to death, you expect to see 4.21 A statistics professor warns her class that her second 
(a) a positive association. midterm is always harder than the first. She tells her 
eye class that students always score 10 points worse on the 
(b) very little association. p 
(c) a negative association. second midterm compared to their score on the first 
s Wav hat th lati 
4.16 Figure 4.7 is a scatterplot of school GPA against IQ Dee aie rae ee tats pee 
test scores for 15 seventh-grade students. There is one : 
low outlier in the plot. The IQ and GPA scores for this (a) 1. (b) —1. (©) Can'ttell without seeing the data. 
student are 4.22 Researchers asked mothers how much soda (in ounces) 
(a) 10 = 05, GPA = 103. their kids drank in a typical day. They also asked 
(b) 10 = 103, GPA = 0.5. these mothers to rate how aggressive their kids were 
(©) 10 = 103, GPA = 76 ona scale of 1 to 10, with larger values corresponding 
= to a greater degree of aggression.'! The correlation 
4.17 If we leave out the low outlier, the correlation for the e 6 be ; 
eed Serene ATisel between amount of soda consumed and aggression 
Pee 6 OS Eee gee Ooes tre rating was found to be r = 0.3. If the researchers had 
(a) 0.9. (5) = 0.9, (c) 0.1. measured amount of soda consumed in liters instead 
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Scatterplot of school GPA against IQ test 80 90 100 io bo BO 
scores for seventh-grade students, for IQ test score 


Exercises 4.16 and 4.17. 
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of ounces, what would be the correlation? (There are 
35 ounces in a liter.) 
(a) 0.3/35 = 0.009 
(c) (0.3)(35) = 10.5 
Researchers measured the percent body fat and the pre- 


ferred amount of salt (percent weight/volume) for several 
children. Here are data for seven children:' SALT 


(b) 0.3 


Preferred amount 
of salt x 


CO 03 OF OF Oho Ole) iil 


Percent body fat y 203 0 223 0 SS 23 0) 


Use your calculator or software: The correlation between 
percent body fat and preferred amount of salt is about 


(a) r = 0.08. (b) 1 = 03. (Oi — 0:8: 
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4.24 


Second-round score 


Scores at the Masters. The Masters is one of the four 
major golf tournaments. Figure 4.8 is a scatterplot of 
the scores for the first two rounds of the 2013 Masters 
for all the golfers entered. Only the 60 golfers with 
the lowest two-round total advance to the final two 
rounds (unless several people are tied for 60th place, 
in which case all those tied for 60th place advance). 
The plot has a grid pattern because golf scores must 
be whole numbers.” £yy)) Mastr13 


(a) Read the graph: What was the lowest score in 
the first round of play? How many golfers had this 
low score? What were their scores in the second 
round? 

(b) Read the graph: Alan Dunbar had the highest 
score in the second round. What was this score? 
What was Dunbar’s score in the first round? 

(c) Is the correlation between first-round scores and 
second-round scores closest to r = 0.01, r = 0.25, 

r = 0.75, or r = 0.99? Explain your choice. Does 
the graph suggest that knowing a professional golfer’s 
score for one round is much help in predicting his 
score for another round on the same course? 
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FIGURE 4.8 
Scatterplot of the scores in the first two rounds of the 2013 Masters 
Tournament, for Exercise 4.24. 


4.25 Happy states. Human happiness or well-being can be 


State mean BRFSS score 


assessed either subjectively or objectively. Subjective 
assessment can be accomplished by listening to what 
people say. Objective assessment can be made from 
data related to well-being such as income, climate, 
availability of entertainment, housing prices, lack of 
traffic congestion, and so on. Do subjective and ob- 
jective assessments agree? To study this, investiga- 
tors made both subjective and objective assessments 
of happiness for each of the 50 states. The subjective 
measurement was the mean score on a life-satisfaction 
question found on the Behavioral Risk Factor Surveil- 
lance System (BRFSS), which is a state-based system of 
health surveys. Lower scores indicate a greater degree 
of happiness. To objectively assess happiness, the in- 
vestigators computed a mean well-being score (called 
the compensating-differentials score) for each state, 
based on objective measures that have been found to 
be related to happiness or well-being. The states were 
then ranked according to this score (Rank 1 being the 
happiest). Figure 4.9 is a scatterplot of mean BRFSS 
scores (response) against the rank based on the com- 
pensating-differentials (explanatory). HAPPY 
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FIGURE 4.9 
Scatterplot of mean BRFSS score in each state against each state's well- 
being rank, for Exercise 4.25. 


(a) Is there an overall positive association or an overall 
negative association between mean BRFSS score and 
rank based on the compensating- differentials method? 
(b) Does the overall association indicate agreement 
or disagreement between the mean subjective BRFSS 
score and the ranking based on objective data used in 


Exercises 


121 


(c) If time in days were replaced by time in number of 
weeks until death began in each later group (fractions 
allowed so that 4 days becomes 4/7 weeks), would 

the correlation between distance and time change? 
Explain your answer. 


age fotostock/Superstock 


4.28 Sparrowhawk colonies. One of nature’s patterns 
the compensating-differentials method? connects the percent of adult birds in a colony that 
(c) Are there any outliers? If so, what are the BRFSS retum: fom the 
scores corresponding to these outliers? previous year and 
4.26 Wine and cancer in women. Some studies have sug- thenumberofnew 
gested that a nightly glass of wine may not only take the adults that join the 
edge off a day but also improve health. Is wine good for colony. Here are 
your health? A study of nearly 1.3 million middle-aged datator 13 colonics 
British women examined wine consumption and the of sparrowhawks:!7 § 
risk of breast cancer. The researchers were interested in i cpRAROW 
how risk changed as wine consumption increased. Risk 
is based on breast cancer rates 
in dinlkeis ake Go ibisag;  POrect meu 74 66 81 52 73 62 52 45 62 46 60 46 38 
cancer rates in nondrinkers in New adults 5 6 & il WM 15 ie WW it i I) aq De 
the study, with higher values 
indicating greater risk. In particular, a value greater (a) Plot the count of new adults (response) against 
than 1 indicates a greater breast cancer rate than that the percent of returning birds (explanatory). Describe 
of nondrinkers. Wine intake is the mean wine intake, in the direction and form of the relationship. Is the cor- 
grams of alcohol per day (1 glass of wine is approximately relation r an appropriate measure of the strength of 
10 grams of alcohol), of all women in the study who this relationship? If so, find r. 
drank approximately the same amount of wine per week. (b) For short-lived birds, the association between 
Here are the data (for drinkers only): these variables is positive: changes in weather 
and food supply drive the populations of new and 
Wine intake x (grams of alcohol per day) 2.5 8.5 15.5 26.5 returning birds up or down together. For long- 
Relative risk y COO Rm IEO Chan eam lE29) lived territorial birds, on the other hand, the asso- 
ciation is negative because returning birds claim 
(a) Make a scatterplot of these data. Based on the . ee pare imipeicclons and fee Pikere 
scatterplot, do you expect the correlation to be posi- role fore vary ee suince Tatil aque elbecion 
tive 1 negative? Near +1 or not? -_ the sparrowhawk? 
(b) Find the correlation r between wine intake and 4.29 Our brains don’t like losses. Most people dislike 


4.27 


relative risk. Do the data show that women who 
consume more wine tend to have higher relative risks 
of breast cancer? 

Ebola and gorillas. The deadly Ebola virus is a threat 
to both people and gorillas in Central Africa. An out- 
break in 2002 and 2003 killed 91 of the 95 gorillas in 
seven home ranges in the Congo. To study the spread 
of the virus, measure “distance” by the number of 
home ranges separating a group of gorillas from the 
first group infected. Here are data on distance and 


Distance 1 3 4 4 4 b) 


losses more than they like gains. In money terms, 
people are about as sensitive to a loss of $10 as to a 
gain of $20. To discover what parts of the brain are 
active in decisions about gain and loss, psychologists 
presented subjects with a series of gambles with dif- 
ferent odds and different amounts of winnings and 
losses. From a subject’s choices, they constructed a 
measure of “behavioral loss aversion.” Higher scores 
show greater sensitivity to losses. Observing brain 
activity while subjects made their decisions pointed 
to specific brain regions. Here are data for 16 sub- 
jects on behavioral loss aversion and “neural loss 
aversion,” a measure of activity in one region of the 


in:/8 &y,.) 

Time 4 21 33 41 43 46 a 
ee ee ee ee ee eo ek 
the explanatory variable? What Behavioral 0.81 001 012 068 O11 036 0.34 
kind of pattern does your plot show? 

(b) Find the correlation r between Neural 20.7 12 155 28.8 41.7 ops 1552 
distance amdl dine. Behavioral 0.68 0.99 1.04 0.66 0.86 129) 1.94 
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(a) Make a scatterplot that shows how behavior 
responds to brain activity. 

(b) Describe the overall pattern of the data. There is 
one clear outlier. What is the behavioral score associ- 
ated with this outlier? 

(c) Find the correlation r between neural and behav- 


(a) Make a scatterplot of the relative risk versus intake, 
using separate symbols for the two types of drinks. 

(b) What does your plot show about the pattern of 
risk? What does it show about the effect of type of 
drink on risk? 


4.32 Feed the birds. Canaries provide more food to their 
ioral loss aversion both with and without the outlier. babies when the babies beg more intensely. Research- 
Does the outlier have a strong influence O10 the value ers wondered if begging was the main factor determin- 
of ie By looking at your plot, explain why adding the ing how much food baby canaries receive, or if parents 
outlier to the other data points causes r to asec also take into account whether the babies are theirs or 
4.30 Sulfur, the ee and the sun. Sulfur in, the at- not. To investigate, researchers conducted an experi- 
mosphere affects climate by influencing formation of ment allowing canary parents to raise two broods: one 
clouds. The main natural source of sulfur es dimethyl of their own and one fostered from a different pair of 
sulfide (DMS) produced by small organisms in the up- parents. If begging determines how much food babies 
aa layers of the oceans. DMS production is in turn receive, then differences in the “begging intensities” 
influenced by the amount of energy the upper ocean of the broods should be strongly associated with differ- 
receives from sunlight. Here are monthly data on solar ences inthe amount of food the broods teceive, The 
radiation dose (SRD, in vets per square meter) and researchers decided to use the relative growth rates 
surface DMS a ee (in nanomolars) for a re- (the growth rate of the foster babies relative to that of 
gion in the Mediterranean: y| SULFUR the natural babies, with values greater than 1 indicat- 
ing that the foster babies grew more rap- 
SRD 2255) 12.91 14.34 WS) 72 PMNS 22.41 37.65 48.41 idly than the natural babies) as a measure 
DMS 0.796 0.692 1.744 1.062 0.682 Ile 0.736 0.720 ot tnevditicrance in thie aroun oF Tood 
SRD 74.41 94.14 109.38 = 157.79 262.67 = 268.96 ~—- 289.23 received. They recorded the difference 
DMS 1.820 1.099 2.692 5.134 8.038 7.280 8.872 in begging intensities (the begging inten- 
sity of the foster babies minus that of the 
natural babies) and relative growth rates. Here are data 
(a) Make a scatterplot that shows how DMS responds from the experiment: CANARY 
to SRD. 
(b) Describe the overall Difference in begging intensity 14.0 V5) 12.0 8.0 8.0 6.5 55) 
pattern of the data. Find Relative growth rate 0.85 100 133 085 090 115 41.00 
the eekcaa ces Difference in begging intensity 35) 3.0 2.0 LS; 165) 0.0 0.0 
Bern ae ‘Relative growth rate £30 ="133 103 9095 1S is 1.00 
Because SRD changes —= 
Withithe ceasonstor the Bisons in begging intensity 2.00 2.00 3.00 4.50 7.00 8.00 8.50 
year, the close rela- Relative growth rate 1.07 1.14 1.00 0.83 ilt5) 0.93 0.70 
cee between SD and UMS helps explain other (a) Make a scatterplot that shows how relative growth 
pe lea rate responds to the difference in begging intensity. 
4.31 Alcohol and cancer in women. Exercise 4.26 discusses a f ; ; 
: ; : : (b) Describe the overall pattern of the relationship. Is 
study of the relationship between wine consumption and paren % ; bes 
it linear? Is there a positive or negative association, or 
the risk of breast cancer in women. The researchers were siya VEE 
i neither? Find the correlation r. Is r a helpful descrip- 
also interested in how risk changed as consumption of : Pine ledoushin? 
ieonlie hea eae d. Intake of tion of this relationship? 
BOO Oe Fae ce OL Se Ue We are tance (c) If begging intensity is the main factor determin- 
alcoholic beverages other than wine is the mean intake, en, ' 
ing food received, with higher intensity leading to 
in grams of alcohol per day, of all women in the study more food, one would expect the relative growth 
who drank approximately the same amount of alcohol nae 
one LH sie ere rate to increase as the difference in begging inten- 
Se ee WOON Fae Neen Oe sity increases. However, if both begging intensity 
women who drank wine and women who drank alcoholic 
b pele ye and a preference for their own babies determine 
Se ita the amount of food received (and hence the rela- 
tive growth rate), we might expect growth rate to 
Wine intake (grams of alcohol per day) tos) teka) Sha) 40:5) increase initially as begging intensity increases but 
Reseda cid LOO 10S iis wp then to level off (or even decrease) as the parents 
begin to ignore increases in begging by the foster 
Alcohol intak f alcohol d 2.00 7.0 13.0 24.0 
= = = ake grams ofall) babies. Which of these theories do the data appear to 
Relative risk O96 OG ie 1e20 support? Explain your answer. 


4.33 Good weather 


and tipping. Fa- 
vorable weather 
has been shown 
to be associated 
with increased 
tipping. Will just 
the belief that B= i 
future weather will be favorable lead to higher tips? 
Researchers gave 60 index cards to a waitress at an 
Italian restaurant in New Jersey. Before delivering the 
bill to each customer, the waitress randomly selected 
a card and wrote on the bill the same message that 
was printed on the index card. Twenty of the cards 
had the message “The weather is supposed to be really 
good tomorrow. I hope you enjoy the day! ” Another 
20 cards contained the message “The weather is sup- 
posed to be not so good tomorrow. I hope you enjoy 
the day anyway!” The remaining 20 cards were blank, 
indicating that the waitress was not supposed to write 
any message. Choosing a card at random ensured that 
there was a random assignment of the diners to the 
three experimental conditions. Here are the percent- 
age tips for the three messages:” £yjy)| TIPPING 


= 
8 
2 
S 
= 
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(c) If drinking an additional gram of alcohol each day 

raised the relative risk of breast cancer by exactly 0.01, 
what would be the correlation between alcohol in wine 
intake and relative risk of breast cancer? (Hint: Draw a 
scatterplot for several values of alcohol in wine intake.) 
The effect of changing units. Changing the units 
of measurement can dramatically alter the appear- 
ance of a scatterplot. Return to the data on percent 
body fat and preferred amount of salt in Exercise 4.23: 


Ey) SALT2 


Preferred amount Oly (03) Or 0 ey OS} ll 
of salt x 
Percent body fat y MO 30 2 30 36 23 30 


In calculating the preferred amount of salt, the weight 
of the salt was in milligrams. A mad scientist decides 
to measure weight in tenths of milligrams. The same 
data in these units are 


Preferred amount 2 3 4 5 6 8 11 
of salt x 
Percent body fat y MO 30 Ph HS SHO) 


(a) Make a plot with the x axis 


WEATHER REPORT PERCENTAGE TIP extending from 0 to 12 and the y 
Good Fase Oe mean omeeie rose om te Pot them 
nal data on these axes. Then plot 
PINS) fig) FL TEKO) PUNE PO) NS IO) PET the new data using a different 
Bad 18.0 19.0 19.2 188 184 19.0 185 16.1 16.8 14.0 — color or symbol. The two plots 
17.0 136 17.5 199 202 188 18.0 23.2 182 19.4 look very different. 
None Dit eis eee ee 
is exactly the same for the two 
18.5 193 193 194 108 19.1 19.7 198 21.3 20.6 sets of measurements. Why do 
you know that this is true without 
(a) Make a plot of percentage tip against the weather doing any calculations? Find the two correlations to 
report on the bill (space the three weather reports verify that they are the same. 
equally on the horizontal axis). Which weather report 4.36 Statistics for investing. Investment reports now often 
appears to lead to the best tip? include correlations. Following a table of correlations 
(b) Does it make sense to speak of a positive or nega- among mutual funds, a report adds: “Two funds can 
tive association between weather report and percent- have perfect correlation, yet different levels of risk. 
age tip? Why? Is correlation ra helpful description of For example, Fund A and Fund B may be perfectly 
the relationship? Why? correlated, yet Fund A moves 20% whenever Fund B 
4.34 Thinking about correlation. Exercise 4.26 presents moves 10%.” Write a brief explanation, for someone 
data on wine intake and the relative risk of breast can- who knows no statistics, of how this can happen. In- 
cer in women. clude a sketch to illustrate your explanation. 
(a) If wine intake is measured in ounces of alcohol per 4.37 Statistics for investing. A mutual funds company’s news- 


day rather than grams per day, how would the correla- 
tion change? (There are 0.035 ounces in a gram.) 

(b) How would r change if all the relative risks were 
0.25 less than the values given in the table? Does 
the correlation tell us that among women who drink, 
those who drink more wine tend to have a greater 
relative risk of cancer than women who don’t drink 
at all? 


letter says, “A well diversified portfolio includes assets 
with low correlations.” The newsletter includes a table of 
correlations between the returns on various classes of in- 
vestments. For example, the correlation between munici- 
pal bonds and large-cap stocks is 0.50, and the correlation 
between municipal bonds and small-cap stocks is 0.21. 

(a) Rachel invests heavily in municipal bonds. She 
wants to diversify by adding an investment whose 
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returns do not closely follow the returns on her 
bonds. Should she choose large-cap stocks or small- 
cap stocks for this purpose? Explain your answer. 

(b) If Rachel wants an investment that tends to 
increase when the return on her bonds drops, what 
kind of correlation should she look for? 

Teaching and research. A college newspaper interviews 
a psychologist about student ratings of the teaching of 
faculty members. The psychologist says, “The evidence 
indicates that the correlation between the research pro- 
ductivity and teaching rating of faculty members is close 
to zero.” The paper reports this as “Professor McDan- 
iel said that good researchers tend to be poor teachers, 
and vice versa.” Explain why the paper’s report is wrong. 
Write a statement in plain language (don’t use the word 
correlation) to explain the psychologist’s meaning. 
Sloppy writing about correlation. Each of the fol- 
lowing statements contains a blunder. Explain in each 
case what is wrong. 


(a) “There is a high correlation between the sex of 
American workers and their income.” 

(b) “We found a high correlation (r = 1.09) between 
students’ ratings of faculty teaching and ratings made 
by other faculty members.” 

(c) “The correlation between height and weight of 
the subjects was r = 0.63 centimeter per kilogram.” 


More about scatterplots and correlation. Here are 
two sets of data. 


Data set A 
5 1 2, 3 
y 1 sy 05) 
Data set B 


Sed |iele le ae Se 
ied A CP SOs. a4 


oN 


(a) Make a scatterplot of both sets of data. Comment on 
any differences you see 


(b) Drag this last point down until it is opposite the 
group of 10 points. How small can you make the 
correlation? Can you make the correlation negative? 
You see that a single outlier can greatly strengthen or 
weaken a correlation. Always plot your data to check 
for outlying points. 


4.42 Match the correlation. You are going to use the Corre- 

A lation and Regression applet to make scatterplots with 

10 points that have correlation close to 0.7. The lesson 

is that many patterns can have the same correlation. 
Always plot your data before you trust a correlation. 


(a) Click on the scatterplot to add the first two 
points. What is the value of the correlation? Why 
does it have this value? 

(b) Make a lower-left to upper-right pattern of 

10 points with correlation about r = 0.7. (You can 
drag points up or down to adjust r after you have 
10 points.) Make a rough sketch of your scatterplot. 
(c) Make another scatterplot with nine points in 

a vertical stack at the left of the plot. Add one point 
far to the right, and move it until the correlation is 
close to 0.7. Make a rough sketch of your scatterplot. 
(d) Make yet another scatterplot with 10 points in 

a curved pattern that starts at the lower left, rises 

to the right, then falls again at the far right. Adjust 
the points up or down until you have a fairly smooth 
curve with correlation close to 0.7. Make a rough 
sketch of this scatterplot also. 


The following exercises ask you to answer questions from data with- 
out having the details outlined for you. The exercise statements give 
you the State step of the four-step process. In your work, follow 
the Plan, Solve, and Conclude steps of the process, described on 
page 63. 


4.43 Global warming. Have average global temperatures 
42 been increasing in recent years? Here are annual aver- 
age global temperatures for the last 20 years in degrees 


Celsius.” 
1996 1997 1998 2000 2001 2002 2003 
14.22) 14.42 14.54 i 14.33 14.45 14.51 14.52 


in the two plots. Year 1994 1995 
(b) Compute the cor- Temperature 14.23 14.35 
relation for both sets Year 2004 2005 
of data. Comment on Temperature 14.48 14.55 


2006 2007 2008 2010 2011 2012 2013 
14.50 14.49 14.41 : 14.56 14.43 14.48 14.52 


any differences in the 
two values. Are these 
differences what you would expect from the plots in 
part (a)? 

Correlation is not resistant. Go to the Correlation 
and Regression applet. Click on the scatterplot to cre- 
ate a group of 10 points in the lower-left corner of the 
scatterplot with a strong straight-line pattern (corre- 
lation about 0.9). 


(a) Add one point at the upper right that is in line 
with the first 10. How does the correlation change? 


Discuss what the data show about change in average 
global temperatures over time. 


4.44 Will women outrun men? Does the physiology of 
women make them better suited than men to long- 
distance running? Will women eventually outperform 
men in long-distance races? Researchers examined 
data on world record times (in seconds) for men and 
women in the marathon. Here are data for women:” 


Year 1926 1964 1967 1970 1971 1974 1975 
Time 13,222.0 11,973.0 11,246.0 10,973.0 9990.0 9834.5 9499.0 
Year 1977 1980 1981 1982 1983 1985 
Time 9287.5 9027.0 8806.0 8771.0 8563.0 8466.0 

Here are data for men: 
Year 1908 1909 1913 1920 1925 1935 1947 
Time 10,518.4 9751.0 9366.6 9155.8 8941.8 8802.0 8739.0 
Year 1952 1953 1954 1958 1960 1963 1964 
Time 8442.2 8314.8 8259.4 8117.0 8116.2 8068.0 7931.2 
Year 1965 1967 1969 1981 1984 1985 = 1988 
Time 7920.0 7776.4 7713.6 7698.0 7685.0 7632.0 7610.0 

(a) What do the data show about women’s and men’s 

times in the marathon? (Start by plotting both sets 

of data on the same plot, using two different plotting 

symbols.) 

(b) Based on these data, researchers (in 1992) 

predicted that women would outrun men in the 

marathon in 1998. How do you think they arrived 

at this date? Was their prediction accurate? (You 

may want to look on the web; try doing a Google 

search on “women’s world record marathon times.”) 

4.45 Toucan’s beak. The toco toucan, the largest member 


of the toucan family, possesses the largest beak relative 
to body size of all birds. This exaggerated feature has 
received various interpretations, such as being a refined 
adaptation for feeding. However, the large surface area 
may also be an important mechanism for radiating heat 
(and hence cooling the bird) as outdoor temperature 
increases. Here are 
data for beak heat 


loss, as a percent 
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and physical pain in similar ways. Psychologists first 
included and then deliberately excluded individuals 
from a social activity while they measured changes in 
brain activity. After each activity, the subjects filled out 
questionnaires that assessed how excluded they felt. 
Here are data for 13 subjects: Ey), 


SOCIAL BRAIN 
SUBJECT DISTRESS ACTIVITY 

il 1.26 —(OH05%) 
2, 1.85 —0.040 
3 1.10 —0.026 
4 U0) = 0.017 
5 2.17 —0.017 
6 2.67 0.017 
7 2.01 0.021 
8 2.18 0.025 
9 2.58 0.027 
10 stl) 0.033 
11 HET (S) 0.064 
12 3):3)3) 0.077 
13 3.65 0.124 


The explanatory variable is “social distress” measured by 
each subject’s questionnaire score after exclusion relative 
to the score after inclusion. (So values greater than 1 show 
the degree of distress caused by exclusion.) The response 
variable is change in activity in a region of the brain that 
is activated by physical pain. Negative values show a de- 
crease in activity, suggesting less distress. Discuss what the 


£ data show about the relationship between social distress 
of total body heat 3 i 
| ; x and brain activity. 
oss, at various = 
temperatures in 3 4.47 Yukon squirrels. The population density of North 
degrees Celsius:”4 3 American red squirrels in the Yukon, Canada, fluc- 
1j,) TOUCAN © tuates annually. Researchers believe one reason for 
the fluctuation may be the availability of 
Temperature (°C) iS 16 i i78 @ 0 mm wp white spruce cones in the spring, a sig- 
Percent heat loss from beak B) 345 0 935" 98 BR 46s 55. 5 nificant source of food for the squirrels. 
Tompomene ce) 3 24 25 26 27 28 29 30 To explore this, researchers measured red 
Percent heat loss from beak 43 52 45 53 58 60 62 62 squirrel population density in the spring 


4.46 


Investigate the relationship between outdoor tempera- 
ture and beak heat loss, as a percentage of total body 
heat loss. 


Does social rejection hurt? We often describe our 
emotional reaction to social rejection as “pain.” 
Does social rejection cause activity in areas of the 
brain that are known to be activated by physi- 
cal pain? If it does, we really do experience social 


and spruce cone production the previ- 
ous autumn over a 23-year period. The 
data for one study area appear in Table 4.2.° Squirrel 
population density is measured in squirrels per hectare. 
Spruce cone production is an index on a logarithmic 
scale, with larger values indicating larger spruce cone 
production. Discuss whether the data support the idea 
that higher spruce cone production in the autumn 
leads to a higher squirrel population density the follow- 
ing spring. £j),| serLco 
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SQUIRREL DENSITY CONE PRODUCTION SQUIRREL DENSITY CONE PRODUCTION 
(SQUIRRELS PER ha) (INDEX) (SQUIRRELS PER ha) (INDEX) 
1.0 0.1 il3) Ie,3} 
1.5) 0.3 Hs) 25) 
1.4 0.4 70) 2.9 
15 0.5 1.0 Dall 
0.7 0.5 1.4 33} 
Ie 0.8 0.9 3.6 
0.9 Il, ile3) 3.8 
1.0 1.4 1.9 4.2 
le; 1.8 1.8 5, Il 
1.4 2.0 1.9 De 
es} Yell 3.4 )9) 
0.8 Dee 
Teacher salaries. For each of the 50 states and the 2013 are available.”’ Discuss whether the data support 
District of Columbia, average Mathematics SAT the idea that higher teacher salaries lead to higher 
scores and average high school teacher salaries for Mathematics SAT scores. ) TCHSAL 


4.49 


4.50 


Exploring the Web 


Drive for show, putt for dough. A popular saying in golf is “You drive for show, 
but you putt for dough.” The point is that hitting the golf ball a long way with a 
driver looks impressive, but putting well is more important for the final score and 
hence the amount of money you win. You can find this season’s Professional Golf- 
ers, Association (PGA) Tour statistics at the PGA Tour website: www.pgatour. 
com/stats.html (click on “View All” under any category displayed to see the 
statistics for all golfers). You can also find these statistics at the ESPN website: 
espn.go.com/golf/statistics/_/year. Look at the most recent put- 
ting, driving, and money earnings data for the current season on the PGA Tour. 


(a) Make a scatterplot of earnings and putting average. Use earnings as the 
response variable. Describe the direction, form, and strength of the relationship in 
the plot. Are there any outliers? 

(b) Make a scatterplot of earnings and driving distance. Use earnings as the 
response variable. Describe the direction, form, and strength of the relationship in 
the plot. Are there any outliers? 

(c) Do your plots support the maxim “You drive for show but you putt for 
dough”? 


Olympic medals. Go to the Chance News website at www. causeweb.org/ 
wiki/chance/index.php/Chance News 61#Predicting medal_ 
counts and read the article “Predicting Medal Counts.” Next, search the web and 
locate the Winter Olympics medal counts for 2010 and 2014 (I found Winter Olym- 
pics medal counts on Wikipedia). Make a scatterplot that is similar to the one in the 
Chance News article but that uses the 2010 medal counts to predict the 2014 medal 
counts. How does your plot compare with the plot in the Chance News article? 


Masakazu Watanabe/Aflo/Getty Images 


Regression 


inear (straightline) relationships between two quantitative variables are easy 

to understand and quite common. In Chapter 4, we found linear relationships 

in settings as varied as Florida manatee deaths, the risk of cancer, and predict- 
ing tropical storms. Correlation measures the direction and strength of these linear 
relationships. When a scatterplot shows a linear relationship, we would like to 
summarize the overall pattern by drawing a line on the scatterplot. 


5.1 Regression lines 


A regression line summarizes the relationship between two variables, but only in 
a specific setting: one of the variables helps explain or predict the other. That is, 
regression describes a relationship between an explanatory variable and a response 
variable. 


Regression Line 


A regression line is a straight line that describes how a response variable y changes as 
an explanatory variable x changes. We often use a regression line to predict the value 
of y for a given value of x when we believe the relationship between y and x is linear. 


CHAPTER 


— 


In this chapter 
we cover... 


5.1 Regression lines 


5.2 The least-squares 
regression line 


5.3 Using technology 


5.4 Facts about least-squares 
regression 


5.5 Residuals 
5.6 Influential observations 


5.7 Cautions about correlation 
and regression 


5.8 Association does not imply 
causation 
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Why is it that some people find it easy to stay slim? Here, following our four-step 
process (page 63), is an account of a study that sheds some light on gaining weight. 


(STATE! Some people don’t gain weight even when they overeat. Perhaps fidgeting 
and other “nonexercise activity” (NEA) explains why. Some people may spontane- 
ously increase nonexercise activity when fed more. Researchers deliberately overfed 
16 healthy young adults for eight weeks. They measured fat gain (in kilograms) and, 
as an explanatory variable, change in energy use (in calories) from activity other 
than deliberate exercise—fidgeting, daily living, and the like. Change in energy use 
was energy use measured the last day of the eight-week period minus energy use 
measured the day before the overfeeding began. Here are the data:' 


NEA Change (cal) —94 —-57 —29 135 143 151 245 355 
Fat Gain (kg) 42 3.0 3.7 2.7 32 36 24 13 
NEA Change (cal) 392 473 486 535 571 580 620 690 
Fat Gain (ke) 38 #417 16 22 10 04 23 Id 


Do people with larger increases in NEA tend to gain less fat? 


PLAN: Make a scatterplot of the data, and examine the pattern. If it is linear, use 
correlation to measure its strength and draw a regression line on the scatterplot to 
predict fat gain from change in NEA. 


BBV Figure 5.1 is a scatterplot of these data. The plot shows a mod- 
erately strong negative linear association with no outliers. The correlation is 


This regression line predicts 
fat gain from NEA. 


4 
Tr ‘ : 
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eS e 
= 
<a sc xi 
me 
: = 
Rig This is the predicted = 
fat gain for a subject ‘< 
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FIGURE 5.1 


Fat gain after eight weeks of overeating, plotted against increase in nonexercise activity over the same 
period, for Example 5.1. 


r = —0.7786. The line on the plot is a regression line for predicting fat gain 
from change in NEA. 


ESOMSEA People with larger increases in NEA do indeed gain less fat. To add to 
this conclusion, we must study regression lines in more detail. 

We can, however, already use the regression line to predict fat gain from NEA. 
Suppose that an individual’s NEA increases by 400 calories when she overeats. Go 
“up and over” on the graph in Figure 5.1. From 400 calories on the x axis, go up to 
the regression line and then over to the y axis. The graph shows that the predicted 
gain in fat is a bit more than 2 kilograms. m 


Many calculators and software programs will give you the equation of a regres- 
sion line from keyed-in data. Understanding and using the line are more important 
than the details of where the equation comes from. 


Review of Straight Lines 


Suppose that y is a response variable (plotted on the vertical axis) and x is an explana- 
tory variable (plotted on the horizontal axis). A straight line relating y to x has an 
equation of the form 


y =at bx 


In this equation, b is the slope, the amount by which y changes when x increases by 
one unit. The number a is the intercept, the value of y when x = 0. 


Any straight line describing the NEA data has the form 
fat gain = a + (b X NEA change) 
The line in Figure 5.1 is the regression line with the equation 
fat gain = 3.505 — 0.00344 x NEA change 


Be sure you understand the role of the two numbers in this equation: 


The slope b =—0,00344 tells us that, on average, fat gained goes down by 
0.00344 kilogram for each added calorie of NEA change. The slope of a regres- 
sion line is the rate of change in the response, on average, as the explanatory 
variable changes. 

= The intercept, a = 3.505 kilograms, is the estimated fat gain if NEA does not 
change when a person overeats. 


The equation of the regression line makes it easy to predict fat gain. If a person's 
NEA change increases by 400 calories when she overeats, substitute x = 400 in the 
equation. The predicted fat gain is 


fat gain = 3.505 — (0.00344 X 400) = 2.13 kilograms 


This is a bit more than 2 kilograms, as we estimated directly from the plot in 
Example 5.1. 

When plotting a line on the scatterplot, use the equation to find the predicted y for 
two values of x, one near each end of the range of x in the data. Plot each y above 
its x value, and draw the line through the two points. m 
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in REGRESSION 
td TOWARD THE 
MEAN 


To “regress” means to go backward. 
Why are statistical methods 

for predicting a response from 

an explanatory variable called 
“regression”? Sir Francis Galton 
(1822-1911), who was the first 

to apply regression to biological 
and psychological data, looked at 
examples such as the heights of 
children versus the heights of their 
parents. He found that taller-than- 
average parents tended to have 
children who were also taller than 
average but not as tall as their 
parents. Galton called this fact 
“regression toward the mean,” and 
the name came to be applied to the 
statistical method. 


plotting a line 
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The slope of a regression line is an important numerical description of the rela- 
tionship between the two variables. Although we need the value of the intercept 
to draw the line, this value is statistically meaningful only when, as in Example 
5.2, the explanatory variable can actually take values close to zero. The slope 
b = —0.00344 in Example 5.2 is small. This does not mean that change in NEA 
has little effect on fat gain. The size of the slope depends on the units in which we 
measure the two variables. In this example, the slope is the change in fat gain in 
kilograms when NEA increases by one calorie. There are 1000 grams in a kilogram. 

If we measured fat gain in grams, the slope would be 1000 times larger, 
A b = —3.44. You can't say how important a relationship is by looking at the size 
of the slope of the regression line. 


() LaunchPad Online Resources 


* The StatClips Examples videos: Regression—Introduction and Motivation 
Examples A, B, and C, discuss examples of regression lines. 


Apply Your Knowledge 


5.1 City Mileage, Highway Mileage. We expect a car’s highway gas mileage to 
be related to its city gas mileage (in miles per gallon, mpg). Data for all 1137 
vehicles in the government’s 2013 Fuel Economy Guide give the regression line 


highway mpg = 6.785 + (1.033 X city mpg) 
for predicting highway mileage from city mileage. 
(a) What is the slope of this line? Say in words what the numerical value 
of the slope tells you. 


(b) What is the intercept? Explain why the value of the intercept is not 
statistically meaningful. 


(c) Find the predicted highway mileage for a car that gets 16 miles per gallon 
in the city. Do the same for a car with city mileage of 28 mpg. 


(d) Draw a graph of the regression line for city mileages between 10 and 
50 mpg. (Be sure to show the scales for the x and y axes.) 


5.2  What’s the Line? An online article suggested that for each additional 
person who took up regular running for exercise, the number of cigarettes 
smoked daily would decrease by 0.178. If we assume that 48 million ciga- 
rettes would be smoked per day if nobody ran, what is the equation of the 
regression line for predicting number of cigarettes smoked per day from the 
number of people who regularly run for exercise? 


5.3 Shrinking Forests. Scientists measured the annual forest loss (in square 
kilometers) in Indonesia from 2000 to 2012.3 They found the regression line 


forest loss = 7500 + (1021 X year since 2000) 
for predicting forest loss in square kilometers from years since 2000. 
(a) What is the slope of this line? Say in words what the numerical value 
of the slope tells you. 


(b) If we measured forest loss in square meters per year, what would the 
slope be? Note that there are 10° square meters in a square kilometer. 

(c) Ifwe measured forest loss in thousands of square kilometers per year, 
what would the slope be? 
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In most cases, no line will pass exactly through all the points in a scatterplot. 
Different people will draw different lines by eye. We need a way to draw a regres- 
sion line that doesn’t depend on our guess of where the line should go. Because we 
use the line to predict y from x, the prediction errors we make are errors in y, the 
vertical direction in the scatterplot. A good regression line makes the vertical distances 
of the points from the line as small as possible. 

Figure 5.2 illustrates the idea. This plot shows three of the points from Figure 5.1, 
along with the line, on an expanded scale. The line passes above one of the points 
and below two of them. The three prediction errors appear as vertical line segments. 
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For example, one subject had x = —57, a decrease of 57 calories in NEA. The line 


predicts a fat gain of 3.7 kilograms, but the actual fat gain for this subject was 3.0 
kilograms. The prediction error is 


error = observed response — predicted response 
= 3.0 — 3.7 = —0.7 kilogram 


There are many ways to make the collection of vertical distances “as small as 
possible.” The most common is the least-squares method. 


Least-Squares Regression Line 


The least-squares regression line of y on x is the line that makes the sum of the 
squares of the vertical distances of the data points from the line as small as possible. 


One reason for the popularity of the least-squares regression line is that the 
problem of finding the line has a simple answer. We can give the equation for the 
least-squares line in terms of the means and standard deviations of the two variables 
and the correlation between them. 


FIGURE 5.2 

The least-squares idea. For each obser- 
vation, find the vertical distance of each 
point on the scatterplot from a regres- 
sion line. The least-squares regression 
line makes the sum of the squares of 
these distances as small as possible. 
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FIGURE 5.3 

Least-squares regression for the nonex- 
ercise activity data: output from a graph- 
ing calculator, three statistical programs, 
and a spreadsheet program. 


Equation of the Least-Squares Regression Line 


We have data on an explanatory variable x and a response variable y for n indi- 
viduals. From the data, calculate the means x and y and the standard deviations 
s, and s, of the two variables and their correlation r. The least-squares regression 
line is the line 


y =a +t bx 


with slope 


and intercept 


a=y — bx 


We write y (read “y hat”) in the equation of the regression line to emphasize that 
the line gives a predicted response ¥ for any x. Because of the scatter of points about 
the line, the predicted response will usually not be exactly the same as the actually 
observed response y. In practice, you don’t need to calculate the means, standard 
deviations, and correlation first. Software or your calculator will give the slope b 
and intercept a of the least-squares line from the values of the variables x and y. You 
can then concentrate on understanding and using the regression line. 


Least-squares regression is one of the most common statistical procedures. Any 
technology you use for statistical calculations will give you the least-squares line 
and related information. Figure 5.3 displays the regression output for the data of 


Texas Instruments Graphing Calculator 


Minitab 
08 
Regression Analysis: fat versus nea i 


The regression equation is 
fat = 3.51 - 0.00344 nea 


Predictor Coef SE Coef iD Pp 
Constant 3.5051 0.3036 11.54 0.000 
-0.0034415 0.0007414 -4.64 0.000 


0.739853 R-Sq : (adj) 


J | 
a BIZ 


(Continued) 


5.3 Using Technology 


Crunchlt! FIGURE 5.3 


eas a = Continued 
Results - Simple Linear Regression 3 | jm} | ( 


Fitted Equation: 
Fat = 3.505 - 0.003441 * NEA 


Estimate Std. Error t value Pr (>|t|) 
(Intercept) 3.505 0.3036 11.54 <0.0001 


NEA -0.003441 0.0007414 -4.642 0.0003810 


r-Squared: 0.6061 Adjusted r-Squared: 0.5780 sigma: 0.7399 


Microsoft Excel 


a rr a 
ee es ey Saas 

|__Regression statistics | 
Jmultiple Rs [o.77assseas | 
[R square [o.6o61as205 | 
Adjusted R Square] 0.578017005 | PT 
[standard Error | 0.739852874| | 
Jobservations [tof 
eee (es | ees (en eee oy) 
Po Cet ficients| standard Error] t stat_| P-value |__| 
Jintercept_ | 3505122916] 0.303616403 [1.54458] 1.53x-08| | 
SS ee 


~ | Linear Fit 
Fat = 3.5051229 - 0.0034415*NEA 
v7 |Summary of Fit 


0.606149 
0.578017 
0.739853 
2.3875 
16 


> | Analysis of Variance 


wv | Parameter Estimates 


So Obi229 0.303616 11.54 <.0001* 
-0.003441 0.000741 -4.64 0.0004* 


Examples 5.1 and 5.2 from a graphing calculator, three statistical programs, and 
a spreadsheet program. Each output records the slope and intercept of the least- 
squares line. The software also provides information that we do not yet need, 
although we will use much of it later. (In fact, we left out part of the Minitab and 
Excel outputs.) Be sure that you can locate the slope and intercept on all four out- 
puts. Once you understand the statistical ideas, you can read and work with almost any 
software output. 
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(3) LaunchPad Online Resources 


* The Snapshots video, Introduction to Regression, discusses least-squares 
regression in the context of a real example (predicting surface water supply 
from the snowpack in Colorado). 


* The software Crunchit! is available in the student resources, as are Crunchlt! 
help videos. Also among the online student resources are Video Technology 
Manuals for Minitab, the Tl graphing calculator, Excel, JMP and SPSS. 


Apply Your Knowledge 


5.4 Coral Reefs. Exercises 4.2 and 4.10 discuss a study in which scientists 
examined data on mean sea surface temperatures (in degrees Celsius) and 


mean coral growth (in millimeters per year) over a several-year period at 
locations in the Red Sea. Here are the data:* Ely) CORAL 


Sea Surface Temperature 29.68 29.87 30.16 30.22 30.48 30.65 30.90 
Growth 2.63 2.58 2.68 2.60 2.48 2.38 2.26 


(a) Use your calculator to find the mean and standard deviation of both 
sea surface temperature x and growth y and the correlation r between 
x and y. Use these basic measures to find the equation of the least- 
squares line for predicting y from x. 


(b) Enter the data into your software or calculator, and use the regression 
function to find the least-squares line. The result should agree with 
your work in part (a) up to roundoff error. 


(c) Say in words what the numerical value of the slope tells you. 


5.5 Death by Intent. Homicide and suicide are both intentional means of ending 
a life. However, the reason for committing a homicide is different from that for 


Homicide Suicide Homicide Suicide 

County Rate Rate County Rate Rate 

Allen 4.2 9.2 Lorain 3.1 11.0 
Ashtabula 1.8 15.5 Lucas 7.4 13.3 
Butler 2.6 od Mahoning 10.9 12.4 
Clermont 1.0 16.0 Medina 0.5 10.0 
Clark 5.6 14.5 Miami 2.6 9.2 
Columbiana 3.5 16.6 Montgomery 95 15.2 
Cuyahoga 9.2 9.5 Portage 1.6 9.6 
Delaware 0.8 7.6 Stark 4.7 13.5 
Franklin 8.7 11.4 Summit 4.9 ES) 
Greene 2.7 12.8 Trumbull 5.8 16.6 
Hamilton 8.9 10.8 Warren 0.7 11.3 
Lake 1.8 11.3 Wayne 1.8 8.9 
Licking 4.5 12.9 Wood 1.0 7.4 


5.4 Facts About Least-Squares Regression | 135 


suicide and we might expect homicide and suicide rates to be uncorrelated. On 
the other hand, both can involve some degree of violence, so perhaps we might 
expect some level of correlation in the rates. The data from 2008-10 for 26 coun- 
ties in Ohio are shown in the table.° Rates are per 100,000 people.Fij) DEATH 


(a) Make a scatterplot that shows how suicide rate can be predicted from 
homicide rate. There is a weak linear relationship, with correlation 
r = 0.23. 

(b) Find the least-squares regression line for predicting suicide rate from 
homicide rate. Add this line to your scatterplot. 

(c) Explain in words what the slope of the regression line tells us. 


(d) Another Ohio county has a homicide rate of 8.0 per 100,000 people. 
What is the county’s predicted suicide rate? 


One reason for the popularity of least-squares regression lines is that they have 
many convenient properties. Here are some facts about least-squares regression 
lines. 

Fact 1. The distinction between explanatory and response variables is 
essential in regression. Least-squares regression makes the distances of the data 
points from the line small only in the y direction. If we reverse the roles of the 
two variables, we get a different least-squares regression line. 


Figure 5.4 repeats the scatterplot of the NEA data in Figure 5.1, but with two least- 
squares regression lines. The solid line is the regression line for predicting fat gain 
from change in NEA. This is the line that appeared in Figure 5.1. 

We might also use the data on these 16 subjects to predict the change in NEA for 
another subject from that subject’s fat gain when overfed for eight weeks. Now the 
roles of the variables are reversed: fat gain is the explanatory variable, and change 
in NEA is the response variable. The dashed line in Figure 5.4 is the least-squares 

line for predicting NEA change from fat gain. The two regression lines are 
A not the same. /n the regression setting, you must know clearly which variable 
is explanatory. ™ 


Fact 2. There is a close connection between correlation and the slope of the 
least-squares line. The slope is 


b=r- 


You see that the slope and the correlation always have the same sign. For 
example, if a scatterplot shows a positive association, then both b and r are positive. 
The formula for the slope b says more: along the regression line, a change of one 
standard deviation in x corresponds to a change of r standard deviations in y. 
When the variables are perfectly correlated (” = 1 or r = —1), the change in the 
predicted response y is the same (in standard deviation units) as the change in x. 
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FIGURE 5.4 

Two least-squares regression lines 
or the nonexercise activity data, for 
Example 5.3. The solid line predicts 
at gain from change in nonexercise 
activity. The equation for this line is 
at = 3.505 — 0.00344 NEA. The 


cise activity from fat gain. The equation 
or this line is NEA = 745.3 — 176 fat 
or after rearranging terms, fat = 4.23 
— 0.00568 NEA). 


dashed line predicts change in nonexer- 
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Otherwise, because —1 <r < 1, the change in y (in standard deviation units) is 
less than the change in x (in standard deviation units). As the correlation grows less 
strong, the prediction ‘y moves less in response to changes in x. 

Fact 3. The least-squares regression line always passes through the point 
(x, y) on the graph of y against x. This is a consequence of the equation of the 
least-squares regression line (box on page 132). In Exercise 5.54 we ask you to 
confirm this. 

Fact 4. The correlation r describes the strength of a straight-line relation- 
ship. In the regression setting, this description takes a specific form: the square 
of the correlation, r’, is the fraction of the variation in the values of y that is 
explained by the least-squares regression of y on x. 

The idea is that when there is a linear relationship, some of the variation in 
y is accounted for by the fact that as x changes, y changes along with it. Look 
again at Figure 5.1, the scatterplot of the NEA data. The variation in y appears 
as the spread of fat gains from 0.4 to 4.2 kg. Some of this variation is explained 
by the fact that x (change in NEA) varies from a loss of 94 calories to a gain of 
690 calories. As x changes from —94 to 690, y changes along the line. You would 
predict a smaller fat gain for a subject whose NEA increased by 600 calories than 
for someone with 0 change in NEA. But the straight-line tie of y to x doesn’t 
explain all of the variation in y. The remaining variation appears as the scatter of 
points above and below the line. 

Although we won’t do the algebra, it is possible to break the variation in the 
observed values of y into two parts. One part measures the variation in y along the 
least-squares regression line as x varies. The other measures the vertical scatter of 
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the data points above and below the line. The squared correlation r’ is the first of 
these as a fraction of the whole: 


variation in y along the regression line as x varies 


total variation in observed values of y 


For the NEA data, r = —0.7786 and r = (—0.7786)* = 0.6062. About 61% of the 
variation in fat gained is accounted for by the linear relationship with change in NEA. 
The other 39% is individual variation among subjects that is not explained by the 
linear relationship. 

Figure 4.2 (page 108) shows a stronger linear relationship between boat 
registrations in Florida and manatees killed by boats. The correlation is r= 0.954 
and r = (0,954)? = 0.910. Ninety-one percent of the year-to-year variation in 
number of manatees killed by boats is explained by regression on number of 
boats registered. Only about 9% is variation among years with similar numbers 
of boats registered. m 


You can find a regression line for any relationship between two quantitative 
A variables, but the usefulness of the line for prediction depends on the strength 

of the linear relationship. So r? is almost as important as the equation of 
the line in reporting a regression. All the outputs in Figure 5.3 include 1’, either 
in decimal form or as a percent. When you see a correlation, square it to get 
a better feel for the strength of the association. Perfect correlation (r = —1 or 
r = 1) means the points lie exactly on a line. Then 1’? = 1, and all the variation 
in one variable is accounted for by the linear relationship with the other variable. 


Ifr = —0.7 orr = 0.7, 7? = 0.49 and about half the variation is accounted for by 
the linear relationship. In the 1’ scale, correlation +0.7 is about halfway between 
Oand +1, 


Facts 2, 3, and 4 are special properties of least-squares regression. They are not 
true for other methods of fitting a line to data that are discussed in more advanced 
courses. 


Apply Your Knowledge 


5.6 How Usefulls Regression? Figure 4.8 (page 120) displays the relation- 
ship between golfers’ scores on the first and second rounds of the 2013 Mas- 
ters Tournament. The correlation is r = 0.246. Exercise 4.30 (page 122) 
gives data on solar radiation (SRD) and concentration of dimethy] sulfide 
(DMS) over a region of the Mediterranean. The correlation is r = 0.969. 
Explain in simple language why knowing only these correlations enables 
you to say that prediction of DMS from SRD by a regression line will 
be much more accurate than prediction of a golfer’s second-round score 
from his first-round score. 


5.7 Feed the Birds. Exercise 4.32 (page 122) gives data from a study in which 
canary parents cared for both their own babies and those of other parents. 
Investigators looked at how the growth rate of the foster babies relative to 
the growth rate of the natural babies changed as the begging intensity for 
food by the foster babies increased over the begging intensity of the natural 
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babies. If begging intensity is the main factor determining food received, with 
higher intensity leading to more food, one would expect the relative growth 
rate to increase as the difference in begging intensity increases. However, if 
both begging intensity and a preference for their own babies determine the 
amount of food received (and hence the relative growth rate), we might ex- 
pect growth rate to increase initially as begging intensity increases but then 
to level off (or even decrease) as the parents begin to ignore further increases 


in begging by the foster babies. CANARY 


(a) Make a scatterplot of the data. Find the least-squares regression line 
for predicting relative growth rate of the foster brood from the dif- 
ference in begging intensity between the foster brood and the actual 
babies of the parents, and add this line to your plot. Should we not use 
the regression line for prediction in this setting? 

(b) What is 7’? What does this value say about the success of the regres- 
sion line in predicting relative growth rate? 


One of the first principles of data analysis is to look for an overall pattern 
and also for striking deviations from the pattern. A regression line describes 
the overall pattern of a linear relationship between an explanatory variable 
and a response variable. We see deviations from this pattern by looking at the 
scatter of the data points about the regression line. The vertical distances from 
the points to the least-squares regression line are as small as possible, in the 
sense that they have the smallest possible sum of squares. Because they repre- 
sent “leftover” variation in the response after fitting the regression line, these 
distances are called residuals. 


Residuals 


A residual is the difference between an observed value of the response variable and 
the value predicted by the regression line. That is, a residual is the prediction error 
that remains after we have chosen the regression line: 


residual = observed y — predicted y 
aia 


“Empathy” means being able to understand what others feel. To see how the brain 
expresses empathy, researchers recruited 16 couples in their midtwenties who 
were married or had been dating for at least two years. They zapped the man’s 
hand with an electrode while the woman watched, and measured the activity in 
several parts of the woman's brain that would respond to her own pain. Brain 
activity was recorded as a fraction of the activity observed when the woman her- 
self was zapped with the electrode. The women also completed a psychological 
test that measures empathy. Will women who score higher in empathy respond 


more strongly when their partner has a painful experience? Here are data for one 
brain region:® 


Subject 1 2 3 4 5 6 7 8 
Empathy Score 38 53 41 55 56 61 62 48 
Brain Activity —0.120 0.392 0.005 0.369 0.016 0.415 0.107 0.506 
Subject 9 10 11 12 13 14 15 16 
Empathy Score 43 47 56 65 19 61 32 105 
Brain Activity 0.153 0.745 0.255) 0.574 0.210 0.722 0.358 0.779 


Figure 5.5 is a scatterplot, with empathy score as the explanatory variable x and brain 
activity as the response variable y. The plot shows a positive association. That is, 
women who score higher in empathy do indeed react more strongly to their partner’s 
pain. The overall pattern is moderately linear, with correlation r = 0.515. 
The line on the plot is the least-squares regression line of brain activity on empathy 
score. Its equation is 
brain activity = —0.0578 + 0.00761(empathy score) 
For Subject 1, with empathy score 38, we predict 
brain activity = —0.0578 + (0.00761)(38) = 0.231 
This subject’s actual brain activity level was —0.120. The residual is 
residual = observed brain activity — predicted brain activity 
= —0.120 — 0.231 = —0.351 


The residual is negative because the data point lies below the regression line. The 
dashed line segment in Figure 5.5 shows the size of the residual. m 
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FIGURE 5.5 

Scatterplot of activity in a region 

of the brain that responds to pain 
versus score on a test of empathy, for 
Example 5.5. Brain activity is measured 
as the subject watches her partner 
experience pain. The line is the least- 
squares regression line. 
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FIGURE 5.6 

Residual plot for the data shown in 
Figure 5.5. The horizontal line at zero 
residual corresponds to the regression 
line in Figure 5.5. 


There is a residual for each data point. Finding the residuals is a bit unpleasant 
because you must first find the predicted response for every x. Software or a graph- 
ing calculator gives you the residuals all at once. Here are the 16 residuals for the 
empathy study data, from software: 


residuals: 
=0.3515.—0).2494'—0:,3526 —0.3072 —0..1166 -—O.1136 0.12237 0.1727 
0.0463 0.0080 0.0084 0.1983 0.4449 0.1369 0.3154 0.0374 


Because the residuals show how far the data fall from our regression line, exam- 
ining the residuals helps us assess how well the line describes the data. Although 
residuals can be calculated from any curve or line fitted to the data, the residuals 
from the least-squares line have a special property: the mean of the least-squares 
residuals is always zero. 

Compare the scatterplot in Figure 5.5 with the residual plot for the same data in 
Figure 5.6. The horizontal line at zero in Figure 5.6 helps orient us. This “residual = 0” 
line corresponds to the regression line in Figure 5.5. 
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Residual Plots 


A residual plot is a scatterplot of the regression residuals against the explanatory 
variable. Residual plots help us assess how well a regression line fits the data. 


A residual plot in effect turns the regression line horizontal. It magnifies the 
deviations of the points from the line and makes it easier to see unusual observa- 
tions and patterns. 

Figure 5.7 shows the overall pattern of some typical residual plots in simplified 
form. The residuals are plotted in the vertical direction against the corresponding 
values of the explanatory variable in the horizontal direction. If our assumptions 
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hold, the pattern of this plot will be an unstructured horizontal band centered at 
O (the mean of the residuals) and symmetric about 0, as in Figure 5.7(a). A curved 
pattern, like the one in Figure 5.7(b), indicates that the relationship between the 
response and explanatory variable is curved rather than linear. A straight line is 
not a good description of such a relationship. A fan-shaped pattern like the one in 
Figure 5.7(c) shows that the variation of the response about the least-squares line 
increases as the explanatory variable increases. Predictions of the response will 
be more precise for smaller values of the explanatory variable, where the response 
shows less variability about the line. 
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FIGURE 5.7 
Some typical residual plots in simplified form. 
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() LaunchPad Online Resources 


* The StatBoards video, Calculating and Plotting Residuals, provides the 
details of calculating residuals and making a residual plot. 


Apply Your Knowledge 


5.8 


5.9 


Residuals by Hand. In Exercise 5.4 you found the equation of the 
least-squares line for predicting coral growth y from mean sea surface 
temperature x. 


(a) Use the equation to obtain the seven residuals step by step. That is, find 
the prediction y for each observation and then find the residual y — 4. 


(b) Check that (up to roundoff error) the residuals add to 0. 


(c) The residuals are the part of the response y left over after the straight- 
line tie between y and x is removed. Show that the correlation between 
the residuals and x is 0 (up to roundoff error). That this correlation is 
always 0 is another special property of least-squares regression. 

Does Fast Driving Waste Fuel? Exercise 4.8 (page 109) gives data on the 

fuel consumption y of a car at various speeds x. Fuel consumption is mea- 

sured in liters of gasoline per 100 kilometers driven, and speed is measured in 
kilometers per hour. Software tells us that the equation of the least-squares 
regression line is 


y = 11.058 — 0.01466x 


Using this equation, we can add the residuals to the original data: ely) FASTDR2 


Speed 
Fuel 


10 20 30 40 50 60 70 80 
ZOO 1300) 10.00 8.00 7.00 5.90 6.30 6.95 


Residual 10.09 dee AWieyh, igi ysis) Gis) 3) 2, 


Speed 
Fuel 


90 100 110 120 130 140 150 
CBT 8.27 9.03 See ONS RE Se 2288 


Residual —2.17 —1.32 —0.42 0.57 1.64 2.76 3.97 


5.10 


(a) Make a scatterplot of the observations, and draw the regression line on 
your plot. 


(b) Would you use the regression line to predict y from x? Explain your answer. 


(c) Verify the value of the first residual, for x = 10. Verify that the residuals 
have sum zero (up to roundoff error). 


(d) Make a plot of the residuals against the values of x. Draw a horizontal 
line at height zero on your plot. How does the pattern of the residuals 
about this line compare with the pattern of the data points about the 
regression line in your scatterplot from part (a)? 


Not Obvious to the Naked Eye. The data set “resids” contains the values 

of a response y, an explanatory variable x, and the residuals from the least- 

squares regression line for predicting y from x. £j))) RESIDS 

(a) Make a scatterplot of the observations, and draw the regression line on 
your plot. 

(b) Would you use the regression line to predict y from x? Explain your 
answer. 


5.6 


(c) Make a plot of the residuals against the values of x. Draw a horizontal 
line at height zero on your plot. How does the pattern of the residuals 
about this line compare with the pattern of the data points about the 
regression line in your scatterplot from part (a)? 


Figures 5.5 and 5.6 show one unusual observation: Subject 16 is an outlier in the 
x direction, with empathy score 40 points higher than any other subject. Because 
of its extreme position on the empathy scale, this point has a strong influence on 
the correlation. Dropping Subject 16 reduces the correlation from r = 0.515 to 
r = 0.331. You can see that this point extends the linear pattern in Figure 5.5 and 
so increases the correlation. We say that Subject 16 is influential for calculating the 
correlation. 


Influential Observations 


An observation is influential for a statistical calculation if removing it would markedly 
change the result of the calculation. 

The result of a statistical calculation may be of little practical use if it depends strongly 
on a few influential observations. 

Points that are outliers in either the x or the y direction of a scatterplot are often 
influential for the correlation. Points that are outliers in the x direction are often 
influential for the least-squares regression line. 


What constitutes a “marked” change? This is somewhat subjective. Changes 
in a calculation that are the same size as round-off error are often not influential. 
Changes in a calculation that differ by a factor of 1.5 or more are often influential. 
If the least-squares regression line computed after removing an observation still 
fits the original data in the scatterplot, the observation is probably not influential. 
However, one can find exceptions to these guidelines and statisticians may disagree 
as to whether an observation should be considered influential. 

If an observation is influential, or if there is some doubt as to whether an obser- 
vation is influential, it can be informative to report statistical calculations with both 
the observation included and the observation removed. This provides readers with 
the ability to assess the effect of the observation. 


Subject 16 in Example 5.5 is influential for the correlation between empathy score 
and brain activity, because removing it reduces r from 0.515 to 0.331. Calculating 
that r= 0.515 is not a very useful description of the data, because the value 
depends so strongly on just one of the 16 subjects. 

Is this observation also influential for the least-squares line? Figure 5.8 shows 
that it is not. The regression line calculated without Subject 16 (dashed) differs little 
from the line that uses all the observations (solid). The reason that the outlier has 
little influence on the regression line is that it lies close to the dashed regression line 
calculated from the other observations. m 


Influential Observations 
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FIGURE 5.8 
Subject 16 is an outlier in the x direction. 


The outlier is not influential for least- coal 
squares regression because removing it 
moves the regression line only a little. 1.0 5 
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To see why points that are outliers in the x direction are often influential for 
regression, let’s try an experiment. Suppose that Subject 16’s point in the scatterplot 
moves straight down. What happens to the regression line? Figure 5.9 gives the 
answer. The dashed line is the regression line with the outlier in its new, lower position. 
Because there are no other points with similar x-values, the line chases the outlier. 


Move the outlier 
down... 


-2 


Brain activity 


FIGURE 5.9 

An outlier in the x direction pulls the 
least-squares line to itself because there 
are no other observations with similar —0.4 5 
values of x to hold the line in place. 

When the outlier moves down, the re- T y T T 
gression line chases it down. The original : ay a ie ee 109 
regression line is solid, and the final posi- Empathy score 

tion of the regression line is dashed. 
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Exercise 5.11. An outlier in x pulls the least-squares line toward itself. If the outlier does not 
lie close to the line calculated from the other observations, it will be influential. 

We did not need the distinction between outliers and influential observations 
in Chapter 2. A single high salary that pulls up the mean salary x for a group of 
workers is an outlier because it lies far above the other salaries. It is also influential 
because the mean changes when it is removed. In the regression setting, however, 
not all outliers are influential. 


Apply Your Knowledge 


5.11 Influence in Regression. The Correlation and Regression applet allows you @ 


& The Correlation and Regression applet allows you to try this experiment yourself—see 


to animate Figure 5.9. Click to create a group of 10 points in the lower-left 
corner of the scatterplot with a strong straight-line pattern (correlation about 
0.9). Click the “Show least-squares line” box to display the regression line. 


(a) Add one point at the upper right that is far from the other 10 points 
but exactly on the regression line. Why does this outlier have no effect 
on the line even though it changes the correlation? 


(b) Now use the mouse to drag this last point straight down. You see that 
one end of the least-squares line chases this single point, whereas the 
other end remains near the middle of the original group of 10. What 
makes the last point so influential? 


5.12 Death by Intent. Return to the data of Exercise 5.5 on homicide rate and 
suicide rate. We will use these data to illustrate influence. DEATH3 


(a) Make a scatterplot of the data that is suitable for predicting suicide rate 
from homicide rate, with two new points added. Point A: homicide 
rate 10.8, suicide rate 27.0. Point B: homicide rate 18.0, suicide rate 
14.6. In which direction is each of these points an outlier? 


(b) Add three least-squares regression lines to your plot: for the original 26 
counties, for the original 26 counties plus Point A, and for the original 
26 counties plus Point B. Which new point is more influential for the 
regression line? Explain in simple language why each new point moves 
the line in the way your graph shows. 


5.13 Outsourcing by Airlines. Exercise 4.5 (page 105) gives data for nine airlines 
on the percent of major maintenance outsourced and the percent of flight 
delays blamed on the airline. AIRLINE 


(a) Make a scatterplot with outsourcing percent as x and delay percent as 
y. Would you consider Hawaiian Airlines to be influential? Would you 
consider Frontier Airlines to be influential? 


(b) Find the correlation r with and without Hawaiian Airlines. How influ- 
ential is Hawaiian Airlines for correlation? 


(c) Find the correlation r without Frontier Airlines. How influential is 
Frontier Airlines for correlation? 


(d) Find the correlation r without both Hawaiian and Frontier Airlines. 
How influential is the pair of airlines for correlation? 


(e) Find the least-squares line for predicting y from x with and without 
both Hawaiian and Frontier Airlines. Draw both lines on your scat- 
terplot. Use both lines to predict the percent of delays blamed on an 
airline that has outsourced 79.1% of its major maintenance, and on an 
airline that has outsourced 0.1% of its major maintenance. How influ- 
ential are Hawaiian and Frontier Airlines for the least-squares line? 
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int DO LEFT- 
i) HANDERS DIE 


EARLY? 
Yes, said a study of 1000 deaths 
in California. Left-handed people 
died at an average age of 66 years; 
right-handers died at 75 years of 
age. Should left-handed people 
fear an early death? No—the 
lurking variable has struck again. 
Older people grew up in an era 
when many natural left-handers 
were forced to use their right 
hands. So right-handers are more 
common among older people, and 
left-handers are more common 
among the young. When we look 
at deaths, the left-handers who die 
are younger on the average because 
left-handers in general are younger. 
Mystery solved. 
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Correlation and regression are powerful tools for describing the relationship 
between two variables. When you use these tools, you must be aware of their limita- 
tions. You already know that 


the calculations for any relationship between two quantitative variables, 
but the results are useful only if the scatterplot shows a linear pattern. 


« Correlation and least-squares regression lines are not resistant. Always plot 
your data and look for observations that may be influential. 


A = Correlation and regression lines describe only linear relationships. You can do 


Here are three more things to keep in mind when you use correlation and regression. 


Beware ecological correlation There is a large positive correlation 
between average income and number of years of education. The correlation is 
smaller if we compare the incomes of individuals with number of years of education. 
The correlation based on average income ignores the large variation in the incomes 
of individuals having the same amount of education. The variation from individual 
to individual increases the scatter in a scatterplot, reducing the correlation. The 
correlation between average income and education overstates the strength of the 

relation between the incomes of individuals and number of years of educa- 
A tion. Correlations based on averages can be misleading if they are interpreted 
to be about individuals. 


Ecological Correlation 


A correlation based on averages rather than on individuals is called an ecological 
correlation. 


Beware extrapolation Suppose that you have data on a child’s growth 
between three and eight years of age. You find a strong linear relationship between 
age x and height y. If you fit a regression line to these data and use it to predict 
height at age 25 years, you will predict that the child will be eight feet tall. Growth 

slows down and then stops at maturity, so extending the straight line to 
adult ages is foolish. Few relationships are linear for all values of x. Don’t make 
predictions far outside the range of x that actually appears in your data. 


Extrapolation 


Extrapolation is the use of a regression line for prediction far outside the range of 
values of the explanatory variable x that you used to obtain the line. Such predictions 
are often not accurate. 


Beware the lurking variable Another caution is even more important: 

the relationship between two variables can often be understood only by tak- 

A ing other variables into account. Lurking variables can make a correlation 
or regression misleading. 
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Lurking Variable 


A lurking variable is a variable that is not among the explanatory or response 
variables in a study and yet may influence the interpretation of relationships among 
those variables. 


You should always think about possible lurking variables before you draw conclu- 
sions based on correlation or regression. 


The Kalamazoo (Michigan) Symphony once advertised a “Mozart for Minors” program 
with this statement: “Question: Which students scored 51 points higher in verbal skills 
and 39 points higher in math? Answer: Students who had experience in music.”” 

We could as well answer “Students who played soccer.” Why? Children with 
prosperous and well-educated parents are more likely than poorer children to have 
experience with music and also to play soccer. They are also likely to attend good 
schools, get good health care, and be encouraged to study hard. These advantages 
lead to high test scores. Family background is a lurking variable that explains why 
test scores are related to experience with music. m 


() LaunchPad Online Resources 


* The StatBoards video, Beware Extrapolation!, provides a discussion of the 
dangers of extrapolation. 


Apply Your Knowledge 


5.14 SAT Scores. The correlation between mean 2013 Math SAT scores and 
mean 2013 Writing SAT scores for all 50 states and the District of Columbia is 
0.972. Would you expect the correlation between the mean state SAT scores 
for these two tests to be lower, about the same, or higher than the correlation 
between the scores of individuals on these two tests? Explain your answer. 


5.15 The Endangered Manatee. Table 4.1 gives 37 years of data on boats 
registered in Florida and manatees killed by boats. Figure 4.2 (page 108) 
shows a strong positive linear relationship. The correlation is r = 0.954. 
2i))) MANATEE 


(a) Find the equation of the least-squares line for predicting manatees 
killed from thousands of boats registered. Because the linear pattern is 
so strong, we expect predictions from this line to be quite accurate—but 
only if conditions in Florida remain similar to those of the past 37 years. 


PhotoDisc 


(b) Suppose we expect that the number of boats registered in Florida will 
be 890,000 in 2014. What would you predict the number of manatees 
killed by boats to be if there are 890,000 boats registered? Explain why 
we can trust this prediction. 


(c) Predict manatee deaths if there were no boats registered in Florida. 
Explain why the predicted count of deaths is impossible. (We use x = 0 
to find the intercept of the regression line, but unless the explanatory 
variable x actually takes values near 0, prediction for x = 0 is an example 
of extrapolation.) 
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PTS 
int THE SUPER 
nd BOWL EFFECT 

The Super Bowl is the most-watched 
TV broadcast in the United States. 
Data show that, on Super Bowl 
Sunday, we consume three times as 
many potato chips as on an average 
day, and 17 times as much beer. 
What’s more, the number of fatal 
traffic accidents goes up in the hours 
after the game ends. Could that be 
celebration? Or catching up with tasks 
left undone? Or maybe it’s the beer. 
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5.16 Is Math the Key to Success in College? A College Board study of 
15,941 high school graduates found a strong correlation between how much 
math minority students took in high school and their later success in college. 
News articles quoted the head of the College Board as saying that “math is 
the gatekeeper for success in college.”* Maybe so, but we should also think 
about lurking variables. What might lead minority students to take more or 
fewer high school math courses? Would these same factors influence success 
in college? 


Thinking about lurking variables leads to the most important caution about cor- 
relation and regression. When we study the relationship between two variables, we 
often hope to show that changes in the explanatory variable cause changes in the 

response variable. A strong association between two variables is not enough 

A to draw conclusions about cause and effect. A least-squares regression line 

that fits the data well and gives accurate predictions is also not enough to 

draw conclusions about cause and effect. Sometimes an observed association really 

does reflect cause and effect. A household that heats with natural gas uses more 

gas in colder months because cold weather requires burning more gas to stay warm. 

In other cases, an association is explained by lurking variables, and the conclusion 
that x causes y is either wrong or not proven. 


A serious study once found that people with two cars live longer than 
people who own only one car.? Owning three cars is even better, and 
so on. There is a substantial positive correlation between number of 
cars x and length of life y. 

The basic meaning of causation is that by changing x we can bring 
about a change in y. Could we lengthen our lives by buying more 
cars? No. The study used number of cars as a quick indicator of afflu- 
ence. Well-off people tend to have more cars. They also tend to live 
longer, probably because they are better educated, take better care 
of themselves, and get better medical care. The cars have nothing to 
do with it. There is no cause-and-effect tie between number of cars 
and length of life. m 


Correlations such as that in Example 5.8 are sometimes called “nonsense 
correlations.” The correlation is real. What is nonsense is the conclusion that 
changing one of the variables causes changes in the other. A lurking variable—such 
as personal affluence in Example 5.8—that influences both x and y can create a 
high correlation even though there is no direct connection between x and y. 


Association Does Not Imply Causation 


An association between an explanatory variable x and a response variable y, even 
if it is very strong, is not by itself good evidence that changes in x actually cause 
changes in y. 


5.8 Association Does Not Imply Causation 


Exercise 4.48 asks you to explore the relationship between Mathematics SAT scores 
and average teacher salaries using 2013 data for each of the 50 states and the 
District of Columbia. A scatterplot of these data shows that SAT scores and salaries 
are negatively correlated and r = —0.278. 

Average scores on the Mathematics SAT are partly determined by the percent 
of test takers. States that don’t require the SAT have fewer test takers. Typically, 
only better students take the SAT exam in these states, hence average scores are 
high. States that do require the SAT tend to be states with higher costs of living 
(for example, states in the Northeastern U.S. such as New York, Connecticut, and 
Massachusetts), and hence higher average teacher salaries. These lurking variables 
explain the negative correlation, and you will explore this in Exercises 5.50 and 5.51. 
In fact, you will discover that, after accounting for the percent of test takers, the 
association actually reverses! m 


Overweight parents tend to have overweight children. The results of a study of 
Mexican American girls aged nine to 12 years are typical. The investigators measured 
body mass index (BMI), a measure of weight relative to height, for both the girls and 
their mothers. People with high BMI are overweight. The correlation between the BMI 
of daughters and the BMI of their mothers was r = 0.506.'° 

Body type is in part determined by heredity. Daughters inherit half their genes 
from their mothers. There is therefore a direct cause-and-effect link between the 
BMI of mothers and daughters. But perhaps mothers who are overweight also set 
an example of little exercise, poor eating habits, and lots of television viewing. 
Their daughters may pick up these habits, so the influence of heredity is mixed up 
with influences from the girls’ environment. Both contribute to the mother-daughter 
correlation. m 


The lesson of Example 5.9 is that association does not imply causation. A lurking 
variable may not only help explain the observed association, but once accounted for 
may reverse the observed association. In Chapter 6, we refer to this as “Simpson’s 
paradox.” 

The lesson of Example 5.10 is more subtle than just “association does 

A not imply causation.” Even when direct causation is present, it may not be 

the whole explanation for a correlation. You must still worry about lurking 

variables. Careful statistical studies try to anticipate lurking variables and measure 

them. The mother-daughter study did measure TV viewing, exercise, and diet. 

Elaborate statistical analysis can remove the effects of these variables to come closer 

to the direct effect of mother’s BMI on daughter’s BMI. This remains a second-best 

approach to causation. The best way to get good evidence that x causes y is to do 

an experiment in which we change x and keep lurking variables under control. We 
will discuss experiments in Chapter 9. 

When experiments cannot be done, explaining an observed association can be 
difficult and controversial. Many of the sharpest disputes in which statistics plays 
a role involve questions of causation that cannot be settled by experiment. Do gun 
control laws reduce violent crime? Does using cell phones cause brain tumors? Has 
increased free trade widened the gap between the incomes of more educated and 
less educated American workers? All of these questions have become public issues. 


experiment 
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All concern associations among variables. And all have this in common: they try 
to pinpoint cause and effect in a setting involving complex relations among many 
interacting variables. 


Despite the difficulties, it is sometimes possible to build a strong case for causation in 
the absence of experiments. The evidence that smoking causes lung cancer is about 
as strong as nonexperimental evidence can be. 

Doctors had long observed that most lung cancer patients were smokers. 
Comparison of smokers and “similar” nonsmokers showed a very strong association 
between smoking and death from lung cancer. Could the association be explained 
by lurking variables? Might there be, for example, a genetic factor that predisposes 
people both to nicotine addiction and to lung cancer? Smoking and lung cancer 
would then be positively associated even if smoking had no direct effect on the lungs. 
How were these objections overcome? @ 
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Let’s answer this question in general terms: what are the criteria for establishing 
causation when we cannot do an experiment? 


The association is strong. The association between smoking and lung cancer is 
very strong. 


The association is consistent. Many studies of different kinds of people in many 
countries link smoking to lung cancer. That reduces the chance that a lurking 
variable specific to one group or one study explains the association. 


Higher doses are associated with stronger responses. People who smoke more 
cigarettes per day or who smoke over a longer period get lung cancer more 
often. People who stop smoking reduce their risk. 


The alleged cause precedes the effect in time. Lung cancer develops after years of 
smoking. The number of men dying of lung cancer rose as smoking became 
more common, with a lag of about 30 years. Lung cancer kills more men than 
any other form of cancer. Lung cancer was rare among women until women 
began to smoke. Lung cancer in women rose along with smoking, again with a 
lag of about 30 years, and has now passed breast cancer as the leading cause of 
cancer death among women. 


The alleged cause is plausible. Experiments with animals show that tars from 
cigarette smoke do cause cancer. 


Medical authorities do not hesitate to say that smoking causes lung cancer. The 
U.S. Surgeon General has long stated that cigarette smoking is “the largest avoid- 
able cause of death and disability in the United States.”!! The evidence for causation 
is overwhelming—but it is not as strong as the evidence provided by well-designed 
experiments. 


Apply Your Knowledge 


5.17 Another Reason Not to Smoke? A stop-smoking booklet says, “Children 
of mothers who smoked during pregnancy scored nine points lower on intel- 
ligence tests at ages three and four than children of nonsmokers.” Suggest 
some lurking variables that may help explain the association between smok- 
ing during pregnancy and children’s later test scores. The association by itself 
is not good evidence that mothers’ smoking causes lower scores. 


5.18 Education and Income. There is a strong positive association between 
workers’ education and their income. For example, the U.S. Census Bureau 
reported in 2012 that the mean income of young adults (ages 25 to 34) who 
worked full-time increased from $19,174 for those with less than a ninth- 
grade education, to $29,058 for high school graduates, to $49,796 for holders 
of a bachelor’s degree, and on up for yet more education. In part, this associa- 
tion reflects causation—education helps people qualify for better jobs. Sug- 
gest several lurking variables that also contribute. (Ask yourself what kinds 
of people tend to get more education.) 


5.19 To Earn More, Get Married? Data show that men who are married, and 
also divorced or widowed men, earn quite a bit more than men the same age 
who have never been married. This does not mean that a man can raise his 
income by getting married, because men who have never been married are 
different from married men in many ways other than marital status. Sug- 
gest several lurking variables that might help explain the association between 
marital status and income. 


CHAPTER 5 SUMMARY 


Chapter Specifics 


A regression line is a straight line that describes how a response variable y changes as 
an explanatory variable x changes. You can use a regression line to predict the value of 
y for any value of x by substituting this x into the equation of the line. 


The slope b of a regression line y = a + bx is the rate at which the predicted response 
y changes along the line as the explanatory variable x changes. Specifically, b is the 
change in y when x increases by 1. 


The intercept a of a regression line y = a + bx is the predicted response y when the 
explanatory variable x=0. This prediction is of no statistical interest unless x can actu- 
ally take values near 0. 


The most common method of fitting a line to a scatterplot is least squares. The least- 
squares regression line is the straight line y = a + bx that minimizes the sum of the 
squares of the vertical distances of the observed points from the line. 


The least-squares regression line of y on x is the line with slope b = rs,/s, and intercept 
a =¥y — bx. This line always passes through the point (x, 9). 


Correlation and regression are closely connected. The correlation r is the slope of 
the least-squares regression line when we measure both x and y in standardized units. 
The square of the correlation 1’ is the fraction of the variation in one variable that is 
explained by least-squares regression on the other variable. 


Correlation and regression must be interpreted with caution. Plot the data to be sure 
the relationship is roughly linear and to detect outliers and influential observations. A 
plot of the residuals makes these effects easier to see. 


Look for influential observations, individual points that substantially change the cor- 
relation or the regression line. Outliers in the x direction are often influential for the 
regression line. 


Be aware of ecological correlation, the tendency for correlations based on averages to 
be stronger than correlations based on individuals. Be careful not to misinterpret cor- 
relations based on averages as applying to individuals. 


Avoid extrapolation, the use of a regression line for prediction for values of the explan- 
atory variable far outside the range of the data from which the line was calculated. 


Summary 
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m Lurking variables may explain the relationship between the explanatory and response 
variables. Correlation and regression can be misleading if you ignore important lurking 
variables. 


=m Most of all, be careful not to conclude that there is a cause-and-effect relationship 
between two variables just because they are strongly associated. High correlation does 
not imply causation. The best evidence that an association is due to causation comes 
from an experiment in which the explanatory variable is directly changed and other 
influences on the response are controlled. 


Link It 


In this chapter we continued our study of exploring relationships between two variables, 
begun in Chapter 4. We use the least-squares regression line to describe the straight-line re- 
lationship between two variables when such a pattern is seen in a scatterplot. The equation 
of the least-squares regression line is a numerical summary that makes precise the notion of 
“straight-line relationship.” 

Even if the least-squares regression line is a good description of the relationship be- 
tween the observed values of two variables, we must exercise caution in how we interpret 
this relationship. Such interpretations rest on the assumption that the relationship is valid 
in some broader sense. We will explore this more carefully later in this book, but in this 
chapter we content ourselves with providing some cautions. Association, as indicated by 
a large correlation, does not imply that there is an underlying cause-and-effect relation 
between the response and explanatory variables. There may, in fact, be a lurking variable 
that influences the interpretation of any relation between the response and explanatory 
variables. Correlations based on averages of measurements tend to be higher than correla- 
tions based on the individual observations used to compute the averages. Be careful not 
to misinterpret correlations based on averages as applying to individuals. Finally, be careful 
not to use the least-squares regression line to make predictions outside the range of values 
of the explanatory variable that you used to obtain the line. 

In this chapter, we studied relationships between two quantitative variables. In the 
next chapter, we explore the relationship between two qualitative variables. 


(4) LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


° StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 


° LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 


| CHECK YOUR SKILLS 


5.20 


School GPA 


Figure 5.10 is a scatterplot of school GPA against 
IQ test scores for 15 seventh-grade students. The 
line is the least-squares regression line for predict- 
ing school GPA from IQ score. If another child in 
this class has IQ score 110, you predict the school 
GPA to be close to 


(a) 2. (b) 7.5. (c) 11. 


100 110 120 130 


1Q test score 


FIGURE 5.10 
Scatterplot of IQ test scores and school GPA for 15 seventh-grade students, 
for Exercises 5.20 and 5.21. 
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The slope of the line in Figure 5.10 is closest to 
(@) =, (b) 0.2. (c) 2.0. 


The points on a scatterplot lie close to the line whose 
equation is y = 4 — 3x. The slope of this line is 


(a) 4. (b) 3. @) =3. 
Fred keeps his savings in his mattress. He began with 


$1000 from his mother and adds $100 each year. His 
total savings y after x years are given by the equation 


(a) y = 1000 + 100x. 
(b) y = 100 + 1000x. 
(c) y = 1000 + x. 


Smokers don’t live as long (on the average) as non- 
smokers, and heavy smokers don’t live as long as light 
smokers. You regress the age at death of a group of 
male smokers on the number of packs per day they 
smoked. The slope of your regression line 

(a) will be greater than 0. 

(b) will be less than 0. 

(c) can’t be determined without seeing the data. 

An owner of a home in the Midwest installed solar 
panels to reduce heating costs. After installing the so- 
lar panels, he measured the amount of natural gas used 


5.26 
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5.29 
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y (in cubic feet) to heat the home and outside tem- 
perature x (in degree-days, where a day’s degree-days 
are the number of degrees its average temperature falls 
below 65°F) over a 23-month period. He then com- 
puted the least-squares regression line for predicting y 
from x and found it to be” 


y = 85 + lox 


How much, on average, does gas used increase for 
each additional degree-day? 


(a) 23 cubic feet (b) 85 cubic feet 
(c) 16 cubic feet 


According to the regression line in Exercise 5.25, the 
predicted amount of gas used when the outside tem- 
perature is 20 degree-days is about 


(a) 405 cubic feet. 
(b) 320 cubic feet. 
(c) 105 cubic feet. 


By looking at the equation of the least-squares re- 
gression line in Exercise 5.25, you can see that the 
correlation between amount of gas used and degree- 
days is 

(a) greater than zero. 

(b) less than zero. 

(c) unable to be determined without seeing the data. 


The software used to compute the least-squares re- 
gression line in Exercise 5.25 says that r? = 0.98. This 
suggests that 


(a) although degree-days and gas used are correlated, 
degree-days do not predict gas used very accurately. 
(b) gas used increases by V0.98 = 0.99 cubic feet for 
each additional degree-day. 

(c) prediction of gas used from degree-days will be 
quite accurate. 


Researchers measured the percent body fat and 
the preferred amount of salt (percent weight/ 
volume) for several children. Here are data for 
seven children: fijy) sat 


Preferred amount of salt x 


C2 O13 O4 Os Oks is) Ili 


Percent body fat y 20 


3) 2 ST) Ss 3) 0) 


Use your calculator or software: what is the equation 
of the least-squares regression line for predicting per- 
cent body fat from preferred amount of salt? 

(a) y = 24.2 + 6.0x 

(b) y = 0.15 + 0.01x 

(c) ¥ = 6.0 + 24.2x 
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| CHAPTER 5 EXERCISES 


5.30 Penguins diving. 


5.31 


5.32 


A study of king 
penguins looked 
for a relation- 
ship between how 
deep the penguins 
dive to seek food 
and how long they 
stay underwater." 
For all but the shallowest dives, there is a linear rela- 
tionship that is different for different penguins. The 
study report gives a scatterplot for one penguin titled 
“The relation of dive duration (DD) to depth (D).” 
Duration DD is measured in minutes, and depth D is 
in meters. The report then says, “The regression equa- 
tion for this bird is: DD = 2.69 + 0.0138D.” 


(a) What is the slope of the regression line? Explain 
in specific language what this slope says about this 
penguin’s dives. 

(b) According to the regression line, how long does a 
typical dive to a depth of 200 meters last? 

(c) The dives varied from 40 meters to 300 meters in 
depth. Use the regression equation to determine DD 
for D = 40 and D = 300 and then plot the regres- 
sion line from D = 40 to D = 300. 


The price of diamond rings. A newspaper advertise- 
ment in the Straits Times of Singapore contained pictures 
of diamond rings and listed their prices, diamond weight 
(in carats), and gold purity. Based on data for only the 
20-carat gold ladies’ rings in the advertisement, the least- 
squares regression line for predicting price (in Singapore 
dollars) from the weight of the diamond (in carats) is! 


Price = 259.63 + 3721.02 Carats 


(a) What does the slope of this line say about the 
relationship between price and number of carats? 
(b) What is the predicted price when number of 
carats = 0? How would you interpret this price? 


Does social rejection hurt? Exercise 4.46 (page 125) 
gives data from a study that shows that social exclusion 
causes “real pain.” That is, activity in an area of the brain 
that responds to physical pain goes up as distress from 
social exclusion goes up. A scatterplot shows a moderately 
strong linear relationship. Figure 5.11 shows Minitab 
regression output for these data. 


(a) What is the equation of the least-squares regression 
line for predicting brain activity from social distress 
score? Interpret the slope in the context of the prob- 
lem. Use the equation to predict brain activity for a 
social distress score of 2.0. 

(b) What percent of the variation in brain activity 
among these subjects is explained by the straight-line 
relationship with social distress score? 
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Regression Analysis: Brain versus Distress 
The regression equation is 
Brain = -0.126 + 0.0608 distress 
Predictor Coef SE Coef T P 
Constant -0.12608 0.02465 -5.12 0.000 
distress 0.060782 0.009979 6.09 0.000 
0.0250896  R-Sq = 77.1% R-Sq (adj) = 75.1% 
js 
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FIGURE 5.11 


Minitab regression output for a study of the effects of social rejection on 
brain activity, for Exercise 5.32. 
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(c) Use the information in Figure 5.11 to find the 
correlation r between social distress score and brain 
activity. How do you know whether the sign of r is 
positive or negative? 


Toucan’s beak. Exercise 4.45 (page 125) gives data on 
beak heat loss, as a percent of total body heat loss from 
all sources, at various temperatures. The data show that 
beak heat loss is higher at higher temperatures and that 
the relationship is roughly linear. Figure 5.12 shows 
Minitab regression output for these data. £jjy)) 


(a) What is the equation of the least-squares regression 
line for predicting beak heat loss, as a percent of total 
body heat loss from all sources, from temperature? 
Explain in specific language what this slope says about 
the relationship between beak heat loss and temperature. 
(b) Use the equation of the least-squares regression line to 
predict beak heat loss, as a percent of total body heat loss 
from all sources, at a temperature of 25 degrees Celsius. 


Soe 


Regression Analysis: Percent heat loss versus Temperature 


The 


regression equation is 


Percent heat loss = 0.92 + 2.06 Temperature 


Predictor 
Constant 
Pairs 


4.50655 82.4% 


R-Sq = 83.6% R-Sq(adj) = 


FIGURE 5.12 
Minitab regression output for a study of how temperature affects beak 
heat loss in toucans, for Exercise 5.33. 
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(c) What percent of the variation in beak heat loss 

is explained by the straight-line relationship with 
temperature? 

(d) Use the information in Figure 5.12 to find the cor- 
relation r between beak heat loss and temperature. How 
do you know whether the sign of r is positive or negative? 


Husbands and wives. The mean height of American 
women in their twenties is about 64.3 inches, and 
the standard deviation is about 2.7 inches. The mean 
height of men the same age is about 69.9 inches, with 
standard deviation about 3.1 inches. Suppose that the 
correlation between the heights of husbands and wives 
is about r = 0.5. 


(a) What are the slope and intercept of the regression 
line of the husband’s height on the wife’s height in young 
couples? Interpret the slope in the context of the problem. 
(b) Draw a graph of this regression line for heights of 
wives between 56 and 72 inches. Predict the height 
of the husband of a woman who is 67 inches tall, and 
plot the wife’s height and predicted husband’s height 
on your graph. 

(c) You don’t expect this prediction for a single 
couple to be very accurate. Why not? 


What’s my grade? In Professor Krugman’s econom- 
ics course, the correlation between the students’ total 
scores prior to the final examination and their final- 
examination scores is r = 0.5. The pre-exam totals for 
all students in the course have mean 280 and standard 
deviation 40. The final-exam scores have mean 75 
and standard deviation 8. Professor Krugman has lost 
Julie’s final exam but knows that her total before the 
exam was 300. He decides to predict her final-exam 
score from her pre-exam total. 


(a) What is the slope of the least-squares regression 
line of final-exam scores on pre-exam total scores in 
this course? What is the intercept? Interpret the slope 
in the context of the problem. 

(b) Use the regression line to predict Julie’s final- 
exam score. 

(c) Julie doesn’t think this method accurately predicts 
how well she did on the final exam. Use 1’ to argue 
that her actual score could have been much higher 
(or much lower) than the predicted value. 


Going to class. A study of class attendance and grades 
among first-year students at a state university showed 
that, in general, students who attended a higher percent 
of their classes earned higher grades. Class attendance 
explained 16% of the variation in grade index among the 
students. What is the numerical value of the correlation 
between percent of classes attended and grade index? 

Sisters and brothers. How strongly do physical char- 
acteristics of sisters and brothers correlate? Here are 
data on the heights (in inches) of 12 adult pairs:'¢ 
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Brother 


TAM CS 66675710 S71 iOS 7371265 56670 


Sister 


CO 64) 65) O35 05625 Go 6460 59 62 64 
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(a) Use your calculator or software to find the cor- 
relation and the equation of the least-squares line for 
predicting sister’s height from brother’s height. Make 
a scatterplot of the data, and add the regression line to 
your plot. 

(b) Damien is 70 inches tall. Predict the height of his 
sister Tonya. Based on the scatterplot and the cor- 
relation r, do you expect your prediction to be very 
accurate? Why? 


Keeping water clean. Keeping water supplies clean 
requires regular measurement of levels of pollutants. 
The measurements are indirect—a typical analysis in- 
volves forming a dye by a chemical reaction with the 
dissolved pollutant, then passing light through the 
solution and measuring its “absorbence.” To calibrate 
such measurements, the laboratory measures known 
standard solutions and uses regression to relate ab- 
sorbence and pollutant concentration. This is usually 
done every day. Here is one series of data on the absor- 
bence for different levels of nitrates. Nitrates are mea- 
sured in milligrams per liter of water.” £j))) NiTRATES 


Nitrates 


50 50 100 200 400 800 1200 1600 2000 2000 


Absorbence 


7.0 7.5 12.8 24.0 47.0 93.0 138.0 183.0 230.0 226.0 
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Percent return x 


New adults y 


(a) Chemical theory says that these data should lie on 
a straight line. If the correlation is not at least 0.997, 
something went wrong, and the calibration procedure 
is repeated. Plot the data and find the correlation. 
Must the calibration be done again? 

(b) The calibration process sets nitrate level and 
measures absorbence. The linear relationship that 
results is used to estimate the nitrate level in water 
from a measurement of absorbence. What is the 
equation of the line used to estimate nitrate level? 
What does the slope of this line say about the 
relationship between nitrate level and absorbence? 
What is the estimated nitrate level in a water 
specimen with absorbence 40? 

(c) Do you expect estimates of nitrate level from 
absorbence to be quite accurate? Why? 


Sparrowhawk colonies. One of nature’s patterns con- 
nects the percent of adult birds in a colony that return 
from the previous year and the number of new adults 
that join the colony. Here are data for 13 colonies of 
sparrowhawks:'* Sy) sPARROW 


74 66 81 52 73 62 52 45 62 46 60 46 38 
5 © & Wil i We IG iy ile) ie} We) AO) 210 


You saw in Exercise 4.28 that there is a moderately strong 
linear relationship, with correlation r = —0.748. 
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(a) Find the least-squares regression line for predict- 
ing y from x. Make a scatterplot and draw your line 
on the plot. 

(b) Explain in words what the slope of the regression 
line tells us. 

(c) An ecologist uses the line, based on 13 colonies, 
to predict how many new birds will join another 
colony, to which 60% of the adults from the previous 
year return. What is the prediction? 

Global warming. Exercise 4.43 (page 124) gives data 
on annual average global temperatures for the last 20 
years in degrees Celsius. ) 


(a) Find the least-squares regression line for predict- 
ing average global temperature from year. Make a 
scatterplot and draw your line on the plot. 

(b) Explain in words what the slope of the regression 
line tells us. 

(c) An environmentalist uses the line, based on the 
20 years, to predict average global temperatures in 
2050. What is the prediction? How reliable do you 
think your prediction is? 


Our brains don’t like losses. Exercise 4.29 (page 121) 
describes an experiment that showed a linear relation- 
ship between how sensitive people are to monetary losses 
(“behavioral loss aversion”) and activity in one part of 
their brains (“neural loss aversion”). £jy,) Losses 


(a) Make a scatterplot with neural loss aversion as x 
and behavioral loss aversion as y. One point is a high 
outlier in both the x and y directions. 

(b) Find the least-squares line for predicting y from x, 
leaving out the outlier, and add the line to your plot. 
(c) The outlier lies very close to your regression line. 
Looking at the plot, you now expect that adding the 
outlier will increase the correlation but will have little 
effect on the least-squares line. Explain why. 

(d) Find the correlation with and without the outlier. 
Your results verify the expectations from part (c). 
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Always plot your data! Table 5.1 presents four sets of 
data prepared by the statistician Frank Anscombe to 
illustrate the dangers of calculating without first plot- 
ting the data.!” 


(a) Without making scatterplots, find the correlation 
and the least-squares regression line for all four data 
sets. What do you notice? Use the regression line to 
predict y for x = 10. {ij} anscome A,B, ¢, and D 


(b) Make a scatterplot for each of the data sets, and 
add the regression line to each plot. 


(c) In which of the four cases would you be 
willing to use the regression line to describe the 
dependence of y on x? Explain your answer in 
each case. 


Managing diabetes. People with diabetes must manage 
their blood sugar levels carefully. They measure their 
fasting plasma glucose (FPG) several times a day with a 
glucose meter. Another measurement, made at regular 
medical checkups, is called HbA. This is roughly the per- 
cent of red blood 
cells that have a 
glucose molecule 
attached. It mea- 
sures average ex- 
posure to glucose 
over a period of 
several months. 
Table 5.2 gives 
data on both HbA 
and FPG for 18 diabetics five months after they had 
completed a diabetes education class.”° jy.) DIABETES 


(a) Make a scatterplot with HbA as the explanatory 
variable. There is a positive linear relationship, but it 
is surprisingly weak. 

(b) Subject 15 is an outlier in the y direction. Subject 
18 is an outlier in the x direction. Find the correlation 


DATA SETA 
x 10 8 13 9 11 6 4 12 ri 5 
y SOA NO. Im) ONO? O] LEDS: 5 5 NS OMEN 2 4 A> ORO. O4 aE ODED. O0) 
DATA SET B 
x 10 8 13 9 11 6 4 12 7 3 
y 9.14 814 874 8.77 9.26 810 613 310 913 7.26 4.74 
DATA SET C 
x 10 8 13 9 11 6 4 12 7 5 
y Wake (05a fiil Zssiik tes eNOS SS) hI) Sr) 
DATA SET D 
x 8 8 8 8 8 8 8 8 8 19 
y OS OMe 8:84 nS: 41l a O42) ON CO LOCO IZ 0) 
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HbA FPG HbA FPG HbA FPG 
SUBJECT (%) (mg/mL) SUBJECT (%) (mg/mL) SUBJECT (%) (mg/mL) 

I 6.1 141 qi 7.5 96 1B 10.6 103 

2 6.3 158 8 i 78 14 10.7 172 

3 6.4 i 9 7.9 148 15 10.7 359 

4 6.8 153 10 A 172 16 11.2 145 

5 To 134 i 9.4 200 17 Oe 147 

6 71 95 12 10.4 Mi 18 19.3 255 
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for all 18 subjects, for all except Subject 15, and for 
all except Subject 18. Are either or both of these sub- 
jects influential for the correlation? Explain in simple 
language why r changes in opposite directions when 
we remove each of these points. 


The effect of changing units. The equation of a re- 
gression line, unlike the correlation, depends on the 
units we use to measure the explanatory and response 
variables. Return to the data on percent body fat and 
preferred amount of salt in Exercise 4.23: SALTS 


Preferred amount of salt x 


0.2 03 04 0.5 06 08 1.1 


Percent body fat y 


20S Oe? OS Cleo ms 0) 


In calculating the preferred amount of salt, the weight 
of the salt was in milligrams. 


(a) Find the equation of the regression line for pre- 
dicting percent body fat from preferred amount of salt 
when weight is in milligrams. 

(b) A mad scientist decides to measure weight in 
tenths of milligrams. The same data in these units are 


Preferred amount of saltx 2 3 4 5 @ 


11 


Percent body fat y 


MQ 30 22 30 3 23 310) 
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Find the equation of the regression line for predicting 
percent body fat from preferred amount of salt when 
weight is in tenths of milligrams. 

(c) Use both lines to predict the percent body fat from 
preferred amount of salt for a child with preferred amount 
of salt 0.7 when weight is measured in milligrams, which is 
the same as 7 when weight is in tenths of milligrams. Are 
the two predictions the same (up to any roundoff error)? 


Managing diabetes, continued. Add three regression 
lines for predicting FPG from HbA to your scatterplot 
from Exercise 5.43: for all 18 subjects, for all except 
Subject 15, and for all except Subject 18. Is either Subject 
15 or Subject 18 strongly influential for the least-squares 
line? Explain in simple language what features of the scat- 
terplot explain the degree of influence. £f))) DIABETES 


Are you happy? Exercise 4.25 (page 120) discusses 
a study in which the mean BRFSS life-satisfaction 
score of individuals in each state was compared with 
the mean of an objective measure of well-being (based 
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on the “compensating-differentials method”) for each 
state. Suppose that instead of the means for the states, 
the BRFSS life-satisfaction scores for individuals were 
compared with the corresponding measure of well- 
being (based on the compensating-differentials method) 
for these individuals. Would you expect the correla- 
tion between the mean state scores on these two mea- 
sures to be lower, about the same, or higher than the 
correlation between the scores of individuals on these 
two measures? Explain your answer. 


One more inch, three more pounds. Data on the av- 
erage weight of men who are between 5 feet 2 inches 
and 6 feet 4 inches tall (rounded to the nearest inch) 
show a very high positive correlation with height. 
Would the correlation be greater, smaller, or about 
the same if you calculated the correlation between the 
weights of individual men and their heights (rounded 
to the nearest inch)? Explain your answer. 


Do artificial sweeteners cause weight gain? People 
who use artificial sweeteners in place of sugar tend to 
be heavier than people who use sugar. Does this mean 
that artificial sweeteners cause weight gain? Give a 
more plausible explanation for this association. 


Learning online. Many colleges offer online versions 
of courses that are also taught in the classroom. It of- 
ten happens that the students who enroll in the online 
version do better than the classroom students on the 
course exams. This does not show that online instruc- 
tion is more effective than classroom teaching, because 
the people who sign up for online courses are often quite 
different from the classroom students. Suggest some dif- 
ferences between online and classroom students that 
might explain why online students do better. 


SAT scores and teacher salaries. The data set “tchsal” 
gives the mean Mathematics SAT score and mean salary 
of teachers in each of the 50 states and the District of 
Columbia in 2013.2! The correlation between mean 
Mathematics SAT score and mean teacher salary is 


(a) Find the least-squares line for predicting mean 
Mathematics SAT score from mean teacher salary. 
Interpret the slope in the context of the problem. 
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(b) Is it reasonable to conclude that states can increase 
the mean Mathematics SAT score in their state by 
reducing teacher salaries? (Hint: Which states have the 
highest and lowest mean Mathematics SAT scores? 
Which states have the highest and lowest cost of living?) 


SAT scores and teacher salaries, continued. The 
data set “tchsal2” gives the mean Mathematics SAT 
score and mean salary of teachers in each of the 50 
states and the District of Columbia in 2013. It also 
includes a categorical variable, pct. taking, that indi- 
cates whether the percentage taking is above 40% (Y) 
or below 40% (N). 2.) tcHsAL2 


(a) Find the least-squares line for predicting mean 
Mathematics SAT score from mean teacher salary for 
only the cases where the percent taking is above 40%. 
Interpret the slope in the context of the problem. 

(b) Find the least-squares line for predicting mean 
Mathematics SAT score from mean teacher salary for 
only the cases where the percent taking is below 40%. 
Interpret the slope in the context of the problem. 

(c) If you did Exercise 4.48, compare your results with 
those in part (a) of Exercise 4.48. What do you conclude? 


Grade inflation and the SAT. The effect of a lurking vari- 
able can be surprising when individuals are divided into 
groups. In recent years, the mean SAT score of all high 
school seniors has increased. But the mean SAT score has 
decreased for students at each level of high school grades 
(A, B, C, and so on). Explain how grade inflation in high 
school (the lurking variable) can account for this pattern. 


Workers’ incomes. Here is another example of the 
group effect cautioned about in the previous exercise. 
Explain how, as a nation’s population grows older, 
median income can go down for workers in each age 
group, yet still go up for all workers. 


Some regression math. Use the equation of the least- 
squares regression line (box on page 132) to show that 
the regression line for predicting y from x always passes 
through the point (x,y). That is, when x = x, the 
equation gives y =y. 


Regression to the mean. Figure 4.8 (page 120) displays 
the relationship between golfers’ scores on the first and 
second rounds of the 2013 Masters Tournament. The 
least-squares line for predicting second-round scores from 
first-round scores has equation y = 56.47 + 0.243. Find 
the predicted second-round scores for a player who shot 80 
in the first round and for a player who shot 70. The mean 
second-round score for all players was 74.20. So, a player 
who does well in the first round is predicted to do less well, 
but still better than average, in the second round. In addi- 
tion, a player who does poorly in the first is predicted to do 
better, but still worse than average, in the second. 

(Comment: This is regression to the mean. If you 
select individuals with extreme scores on some measure, 
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they tend to have less extreme scores when measured 
again. That’s because their extreme position is partly 
merit and partly luck, and the luck will be different next 
time. Regression to the mean contributes to lots of “ef- 
fects.” The rookie of the year often doesn’t do as well 
the next year; the best player in an orchestral audition 
may play less well once hired than the runners-up; a stu- 
dent who feels she needs coaching after taking the SAT 
often does better on the next try without coaching.) 


Regression to the mean. We expect that students who 
do well on the midterm exam in a course will usually 
also do well on the final exam. Gary Smith of Pomona 
College looked at the exam scores of all 346 students 
who took his statistics class over a 10-year period.” 
The least-squares line for predicting final exam score 
from midterm-exam score was y = 46.6 + 0.41x. 
(Both exams have a 100-point scale.) 

Octavio scores 10 points above the class mean on the 
midterm. How many points above the class mean do you 
predict that he will score on the final? (Hint: Use the fact 
that the least-squares line passes through the point (x, y) 
and the fact that Octavio’s midterm score is x + 10.) 
This is another example of regression to the mean: stu- 
dents who do well on the midterm will, in general, do less 
well, but still above average, on the final. 


Is regression useful? In Exercise 4.42 (page 124) you 
used the Correlation and Regression applet to create 
three scatterplots having correlation about r = 0.7 be- 
tween the horizontal variable x and the vertical vari- 
able y. Create three similar scatterplots again, and click 
the “Show least-squares line” box to display the regres- 
sion lines. Correlation r = 0.7 is considered reasonably 
strong in many areas of work. Because there is a reason- 
ably strong correlation, we might use a regression line to 
predict y from x. In which of your three scatterplots does 
it make sense to use a straight line for prediction? 


Guessing a regression line. In the Correlation and 
Regression applet, click on the scatterplot to create a 
group of 15 to 20 points from lower left to upper right 
with a clear positive straight-line pattern (correlation 
around 0.7). Click the “Draw line” button and use the 
mouse (right-click and drag) to draw a line through 
the middle of the cloud of points from lower left to 
upper right. Note the “thermometer” above the plot. 
The red portion is the sum of the squared vertical dis- 
tances from the points in the plot to the least-squares 
line. The green portion is the “extra” sum of squares 
for your line—it shows by how much your line misses 
the smallest possible sum of squares. 


(a) You drew a line by eye through the middle of the 
pattern. Yet the right-hand part of the bar is probably 
almost entirely green. What does that tell you? 

(b) Now click the “Show least-squares line” box. Is 
the slope of the least-squares line smaller (the new 
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TIME DISTANCE HAND TIME DISTANCE HAND 
115 190.70 right 240 190.70 left 
96 138.52 right 190 1S}e.52 left 
110 165.08 right 170 165.08 left 
100 126.19 right 125 126.19 left 
111 163.19 right Bills 163.19 left 
101 305.66 right 240 305.66 left 
111 176.15 right 141 176.15 left 
106 162.78 right 210 162.78 left 
96 147.87 right 200 147.87 left 
96 271.46 right 401 271.46 left 
5) 40.25 right 320 40.25 left 
96 24.76 right iis) 24.76 left 
96 104.80 right 176 104.80 left 
106 136.80 right PML 136.80 left 
100 308.60 right 238 308.60 left 
113 279.80 right 316 279.80 left 
123 175),5)1l right 176 125.51 left 
111 329.80 right 73) 329.80 left 
95 51.66 right 210 51.66 left 
108 201.95 right 170 201.95 left 


line is less steep) or larger (line is steeper) than that 
of your line? If you repeat this exercise several times, 
you will consistently get the same result. The least- 
squares line minimizes the vertical distances of the 
points from the line. It is not the line through the 
“middle” of the cloud of points. This is one reason 
why it is hard to draw a good regression line by eye. 


The following exercises ask you to answer questions from data without 
having the details outlined for you. The exercise statements give you 
the State step of the four-step process. In your work, follow the Plan, 
Solve, and Conclude steps of the process, described on page 63. 


5.59 Beavers and beetles. Do beavers benefit beetles? Research- 


ers laid out 23 circular plots, each four meters in diameter, 
in an area where beavers were cutting down cottonwood 
trees. In each plot, they counted the number of stumps 
from trees cut by beavers and the number of clusters of 
beetle larvae. Ecologists think that the new sprouts from 
stumps are more tender than other cottonwood growth, so 
that beetles prefer them. Ifso, more stumps should produce 
more beetle larvae. Here are the data:” am BEAVERS 
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Analyze these data to see if they support the “beavers 
benefit beetles” idea. 


A computer game. A multimedia statistics learning 
system includes a test of skill in using the computer’s 
mouse. The software displays a circle at a random lo- 
cation on the computer screen. The subject clicks in 
the circle with the mouse as quickly as possible. A new 
circle appears as soon as the subject clicks in the old 
one. Table 5.3 gives data for one subject’s trials, 20 
with each hand. Distance is the distance from the cur- 
sor location to the center of the new circle, in units 
whose actual size depends on the size of the screen. 
Time is the time required to click in the new circle, in 
milliseconds.** We suspect that time depends on dis- 
tance. We also suspect that performance will not be 
the same with the right and left hands. Analyze the 
data with a view to predicting performance separately 


for the two hands. COMGAME 


Stumps 2 Z it 3 3 4 s it @ 5 3} 
Beetle larvae 10 30 12 24 36 40 43 11 27 #56 18 40 
Stumps 2 Lt 2 2 | 1 4 1 2 #1 4 
Beetle larvae 25 8 21 14 16 6 54 9 13 14 50 
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Predicting tropical storms. William Gray heads the 
Tropical Meteorology Project at Colorado State Uni- 
versity (well away from the hurricane belt). His fore- 
casts before each year’s hurricane season attract lots of 


= attention. Here are 
& data on the number of 
a3 named Atlantic tropi- 
gs cal storms predicted by 
aS Dr. Gray and the ac- 
3 E tual number of storms 
=e for the years 1984 to 
< 3 2OIB 2 STORMS2 
YEAR FORECAST ACTUAL | YEAR FORECAST ACTUAL 
1984 10 12) 1999 14 12 
1985 11 iil 2000 il2 14 
1986 8 6 2001 12 15 
1987 8 tl 2002 11 i 
1988 (ull iy 2003 14 16 
1989 it ili 2004 14 14 
1990 ite 14 2005 15 27 
1991 8 8 2006 17 10 
1992 8 6 2007 17 14 
1993 il 8 2008 15 16 
1994 9 fl 2009 11 9 
1995 12 19 2010 18 19 
1996 10 13 2011 16 19 
1997 11 fi 2012 14 19 
1998 10 14 2013 18 13} 
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Analyze these data. How accurate are Dr. Gray’s fore- 
casts? How many tropical storms would you expect in 
a year when his preseason forecast calls for 16 storms? 
What is the effect of the disastrous 2005 season on 
your answers? 


Great Arctic rivers. One effect of global warming is 
to increase the flow of water into the Arctic Ocean 
from rivers. Such an increase may have major effects 
on the world’s climate. Six rivers (Yenisey, Lena, Ob, 
Pechora, Kolyma, and Severnaya Dvina) drain two- 
thirds of the Arctic in Europe and Asia. Several of 
these are among the largest rivers on earth. Table 5.4 
presents the total discharge from these rivers each year 
from 1936 to 2010.”° Discharge is measured in cubic 
kilometers of water. Analyze these data to uncover 
the nature and strength of the trend in total discharge 
over time. ely) ARCTIC 


YEAR DISCHARGE YEAR DISCHARGE YEAR DISCHARGE YEAR DISCHARGE 
1936 AA 1955 1656 1974 2000 1993 1845 
1937 1713 1956 1721 1975 1928 1994 1902 
1938 1860 1957 1762 1976 1653 1995 1842 
1939 1739 1958 1936 1977 1698 1996 1849 
1940 1615 1959 1906 1978 2008 Ooi 2007 
1941 1838 1960 1736 1979 1970 1998 1903 
1942 1762 1961 1970 1980 1758 1999 1970 
1943 1709 1962 1849 1981 1774 2000 1905 
1944 1921 1963 1774 1982 1728 2001 1890 
1945 1581 1964 1606 1983 1920 2002 2085 
1946 1834 1965 1735 1984 1823 2003 1780 
1947 1890 1966 1883 1985 1822 2004 1900 
1948 1898 1967 1642 1986 1860 2005 1930 
14) 1958 1968 1713 1987 Be 2006 1910 
1950 1830 1969 1742 1988 1906 2007 2270 
1951 1864 1970 1751 1989 1932 2008 2078 
1952 1829 1971 1879 1990 1861 2009 1900 
1953 1652 1972 1736 1991 1801 2010 1813 
1954 1589 1973 1861 1992 1793 


5.63 Will women outrun men? Does the physiology of 

women make them better suited than men to long- 
42 distance running? Will women eventually outperform 
men in long-distance races? Researchers examined 
data on world record times (in seconds) for men and 
women in the marathon. Based on these data, re- 
searchers (in 1992) attempted to predict when women 
would outrun men in the marathon. Here are data for 
women: ”’ £ij)| MARATHON 
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Analyze these data using least-squares regression to 
estimate when men and women’s record times will be 
equal. How reliable is your estimate? (You may wish 
to look online to find the current men and women’s 
record times.) 


5.64 Will women outrun men?, continued. The data set 
“run2” contains all the world record times (in sec- 
onds) for men and women in the marathon up to 
2013. Use these data to repeat the analysis in Ex- 
ercise 5.63; that is, use least-squares regression to 


estimate when men and women’s record times will 


Year | 1926 1964 1967 1970 1971 1974 1975 answer. Sl 
Time 13,222.0 11,973.0 11,246.0 10,973.0 9990.0 9834.5 9499.0 =) 
Year | 1977 1980 1981 1982 1983 1985 
Time 9287.5 9027.0 8806.0 8771.0 8563.0 8466.0 
Here are data for men: 
Year 1908 1909 1913 1920s «1925. «1935. S197 
Time 10,518.4 9751.0 9366.6 9155.8 8941.8 8802.0 8739.0 
Year 1952 1953 1954 1958 1960 1963 1964 
Time 8442.2 8314.8 8259.4 8117.0 8116.2 8068.0 7931.2 
Year 1965 1967 1969 1981 1984 1985 1988 
Time 7920.0 7776.4 7713.6 7698.0 7685.0 7632.0 7610.0 


5.65 


5.66 


Exploring the Web 


Association and causation. Find an example of a study in which the issue of 
association and causation is present (either association is confused with causa- 
tion, or association is not confused with causation). Summarize the study and its 
conclusions in your own words. Be sure to include a copy of the actual article 
or at least the web source, title, and location where the article was published. 
The Chance News website at www. causeweb. org/wiki/chance/index. 
php/Main_Page is a good place to look for examples. 


Predicting batting averages. Go to www.mlb.com/ and find the batting 
averages for a diverse set of 30 players for both the 2012 and 2013 seasons. You can 
click on the “Stats” tab to find the results for the current season as well as historical 
data. You should select only players who played in at least 50 games both seasons. 
Make a scatterplot of the batting averages using the 2012 season average as the 
explanatory variable and the 2013 season average as the response. Is it reason- 
able to fit a straight line to these data? If so, find the least-squares regression line 
for predicting batting average in 2013 from that in 2012 based on your sample of 


be equal. How reliable s this estimate? Explain your 
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30 players. In 2012, the major league leader in batting was Buster Posey, who had 
a batting average of 0.336. What does your least-squares regression line predict for 
the 2013 batting average of someone who hit 0.336 in 2012? Is the 2013 predicted 
batting average higher or lower than 0.336? 


Predicting the federal budget. Go to the Congressional Budget Office website, 
www.cbo.gov/topics/. Click on the “topics and subtopics” button and se- 
lect “Budget and Economic Outlook.” Then click on “apply.” What is the current 
prediction for the federal budget in five years’ time? Is a surplus or a deficit pre- 
dicted? Do you think the prediction is accurate? Why or why not? 


e have concentrated on relationships in which at least the response variable 

is quantitative. Now we will describe relationships between two categorical 

variables. Some variables—such as sex, race, and occupation—are cat- 
egorical by nature. Other categorical variables are created by grouping values of a 
quantitative variable into classes. Published data often appear in grouped form to 
save space. To analyze categorical data, we use the counts or percents of individuals 
that fall into various categories. 


EXAMPLE 6.1 A Job Outside the Home 


A sample survey of adults (aged 18 and over) asked, “If you were free to do 
either, would you prefer to have a job outside the home, or would you prefer to 
stay home and take care of the house and family?” Table 6.1 shows the responses. ' 
This is a two-way table because it describes two categorical variables. One is 
the sex and education level of the respondent. The other is the preferred lifestyle 
(a job outside the home, stay home, or no preference). Sex and education is the 
row variable because each row in the table describes a combination of sex and 
education level. Because the education level has a natural order from “No college” to 
“College,” the rows are also in this order for women and men. Preferred lifestyle 


*This material is important in statistics, but it is needed later in this book only for Chapter 25. You 
may omit it if you do not plan to read Chapter 25, or delay reading it until you reach Chapter 25. 


Marginal distributions 
Conditional distributions 


Simpson’s paradox 


two-way table 


row variable 


163 


164. 


CHAPTER 6 & Two-Way Tables 


column variable 


PREFERRED LIFESTYLE 

SEX AND JOB OUTSIDE STAY HOME AND CARE NO 

EDUCATION HOME FOR HOUSE, FAMILY PREFERENCE| TOTAL 
Women no college 81 104 10 195 

Women college 173 115 15 303 

Men no college 92 32 2 126 

Men college 299 81 8 388 

Total 645 332 35 1012 


is the column variable because each column describes one choice. The entries 
in the table are the counts of individuals in each sex-and-education-level-by- 
preferred-lifestyle class. The entries in the right margin are the total of the row 
entries, the entries in the bottom margin are the total of the column entries, and 
the entry at the bottom right is the total of all adults in the study. m 


marginal distributions 


How can we best grasp the information contained in Table 6.1? First, look at the 
distribution of each variable separately. The distribution of a categorical variable says 
how often each outcome occurred. The “Total” column at the right of the table 
contains the totals for each of the rows. These row totals give the distribution of 
sex and education level in the entire group of 1012 adults: 195 were women with no 
college education, 303 were women with a college education, and so on. 

If the row and column totals are missing, the first thing to do in studying a two- 
way table is to calculate them. The distributions of sex and education alone and 
preferred lifestyle alone are called marginal distributions because they appear at 
the right and bottom margins of the two-way table. 

Percents are often more informative than counts. We can display the marginal 
distribution of sex and education in percents by dividing each row total by the table 
total and converting to a percent. 


The percent of these adults who were women with no college is 


women with no college total 195 
table total ~ 1012 


= 0.193 = 19.3% 


Do three more such calculations to obtain the marginal distribution of sex and educa- 
tion in percents for each group. Here is the complete distribution: 


Response Percent 
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It seems that more women and more men have attended college than not attended 
college. The total is 100% because everyone belongs to one of the four sex and 
education classes. m 


Each marginal distribution from a two-way table is a distribution for a single 
categorical variable. As we saw in Chapter 1, we can use a bar graph or a pie chart 
to display such a distribution. Figure 6.1 is a bar graph of the distribution of sex and 
education among adults in the sample. 

In working with two-way tables, you must calculate lots of percents. Here’s a tip 
to help you decide what fraction gives the percent you want. Ask, “What group 
represents the total of which I want a percent?” The count for that group is the 
denominator of the fraction that leads to the percent. In Example 6.2, we want a 
percent “of adults,” so the count of adults (the table total) is the denominator. 


Percent of adults 


FIGURE 6.1 
Women no Women Men no Men Bar graph of the distribution of sex and 
college college college college education of adults who participated in 
P the survey. This is one of the marginal 
Sex and education 


distributions for Table 6.1. 


(3) LaunchPad Online Resources 


* The StatBoards video, Marginal Distributions, provides the details of 
calculating marginal distributions through an example. 


Apply Your Knowledge 


6.1 Video Gaming and Grades. The popularity of computer, video, online, 
and virtual reality games has raised concerns about their ability to negatively 
affect youth. The data in this exercise are based on a recent survey of 14- to 
18-year-olds in Connecticut high schools. Here are the grade distributions of 
boys who have and have not played video games.’ £y))| GAMING 


Grade Average 
AsandBs_ (Cs Ds and Fs 


Played games 736 450 193 
Never played games 205 144 80 
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ay SMILING FACES 
Women smile more than 


men. The same data that produce 
this fact allow us to link smiling to 
other variables in two-way tables. 

For example, as the second variable, 
add whether or not the person thinks 
they are being observed. If yes, that’s 
when women smile more. If no, 
there’s no difference between women 
and men. Next, take the second 
variable to be the person’s social role 
(for example, is he or she the boss in 
an office?). Within each role, there 

is very little difference in smiling 
between women and men. 


(a) How many people does this table describe? How many of these have 
played video games? 


(b) Give the marginal distribution of the grades. What percent of the boys 
represented in the table received a grade of C or lower? 


6.2 Ages of College Students. Here is a two-way table of U.S. Census Bureau 
data describing the age and sex of all American students enrolled in college. 
The table entries are counts in thousands of students.’ Ely) AGES 


Age Group Female Male 
18 to 24 years 6432 5640 
25 to 34 years 2450 1843 
35 years or older 2124 1069 


(a) How many college undergraduates are there? 


(b) Find the marginal distribution of age group. What percent of under- 
graduates are in the 18- to 24-year-old college age group? 


Table 6.1 contains much more information than the two marginal dis- 

A tributions of sex and education alone and preferred lifestyle alone. Marginal 

distributions tell us nothing about the relationship between two variables. To 

describe a relationship between two categorical variables, we must calculate some 
well-chosen percents from the counts given in the body of the table. 

Let’s say we want to compare the preferred lifestyles of women and men with 
different levels of education. To do this, compare percents for each sex and 
education category. To study the preferred lifestyles of women who have not 
been to college, we look only at the “Women no college” row in Table 6.1. To 
find the percent of women with no college who prefer a job outside the home, 
divide the count of such women by the total number of women with no college 
(the row total): 


women with no college who prefer ajob outside the home 81 
row total 195 


0.415 = 41.5% 


Doing this for all three entries in the “Women no college” row gives the conditional 
distribution of preferred lifestyles among women with no college. We use the term 
conditional because this distribution describes only adults who satisfy the condition 
that they are women with no college. 


Marginal and Conditional Distributions 


The marginal distribution of one of the categorical variables in a two-way table of 
counts is the distribution of values of that variable among all individuals described 
by the table. 

A conditional distribution of a variable is the distribution of values of that vari- 
able among only individuals who have a given value of the other variable. There is a 
separate conditional distribution for each value of the other variable. 


6.2 Conditional Distributions 


(STATE? How do women and men with different levels of education differ in their 
responses to the question, “If you were free to do either, would you prefer to have a 
job outside the home, or would you prefer to stay home and take care of the house 
and family?” 


the four conditional distributions of response for each sex and education category. 
Compare these four distributions. 


ESSE Table 6.1 is the two-way table we need. Look first at just the “Women no 
college” row to find the conditional distribution for women with no college, then at 
just the “Women college” row to find the conditional distribution for women who 
have been to college, and so on. Here are the calculations and the four conditional 
distributions: 


Response Job Outside Home Stay Home No Preference 
Women no college & = 41.5% it = 53.3% is = 5.1% 
Women college te = 51.1% 503 = 38.0% 30; = 5.0% 
Men no college a = 73.0% i = 25.4% ae = 1.6% 
Men college 399 = 77.1% 3 = 20.9% 35 = 2.1% 


The percents in each row should be 100% because for each sex and education cat- 
egory, everyone holds one of the three lifestyle preferences. In fact, the percents add 
to 99.9%, 100.0%, or 100.1% because we round each one to the nearest tenth. 
This is roundoff error. 

Each set of percents adds to 100% because everyone holds one of the three 
opinions. 
[SSSR For those in the sample, men are more likely to prefer a job outside 
the home than are women, whether they went to college or not. Adults who attended 
college are more likely to prefer a job outside the home, whether they are women 
or men. m 


Software will do these calculations for you. Most programs allow you to choose 
which conditional distributions you want to compare. The output in Figure 6.2 
presents the four conditional distributions of preferred lifestyle, for each sex and 
education category, and also the marginal distribution of opinion for all the adults. 
The distributions agree (up to roundoff) with the results in Examples 6.2 and 6.3. 

Remember that there are two sets of conditional distributions for any two-way 
table. Example 6.3 looked at the conditional distributions of preferred lifestyles 
for the four sex and education categories. We could also examine the three condi- 
tional distributions of sex and education, one for each of the three preferred life- 
styles, by looking separately at the three columns in Table 6.1. Figure 6.3 makes 
this comparison in a bar graph. Each bar is divided (segmented) into four parts, 
represented by four colors. The upper portion of each bar represents the percent 
of women with no college among adults who prefer each lifestyle. The other por 
tions represent the percents of each of the other sex and education categories. 
Each bar has a height of 100% because each bar represents all the adults in each 
different group of people. Bar graphs like that in Figure 6.3 in which each bar 
is divided into parts, each part representing a different category, are sometimes 
called segmented bar graphs. 


roundoff error 


segmented bar graphs 
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FIGURE 6.2 


Minitab and Crunchit! output for 
the two-way table of adults by 


J 


Rows: Sex and education Columns: Preferred lifestyle iz 


preferred lifestyle and sex and 
education. Each entry in the Minitab 
output includes the percent of its 
row total. The “Men college,” “Men 
no college,” “Women college,” and 
“Women no college” rows give 

he conditional distributions of 
responses for each sex and educa- 
ion category, and the “All” row 
shows the marginal distribution of 
responses for all these adults. No- 
ice that Minitab orders variables in 
he table alphabetically. Each entry 
in the Crunchit! output includes the 
percent of its row total, the percent 
of its column total, and the percent 
of the entire table total. The second 
entry in each cell gives the condi- 
tional distribution of responses for 
the different sex and education cat- 
egories. The third entry in each cell 
for the preferred lifestyle columns 
gives the conditional distributions 
of responses for each category of 
preferred living. The “All” row and 
column show the corresponding 
marginal distribution of responses 
for all these adults. 


Men college 


Men no college 


Women college 


Women no college 


All 


Cell Contents: 


Women no college 


Men no college 


Men college 


Women college 


Count % of Row 


Chi-squared statistic: 
(eban 
P-value: 


Job 
outside 
home 


299 
77.06 


92 
73.02 


173 
57.10 


81 
41.54 


645 
63.74 


Count 


preference 


% of Row 


Stay home 


104 

53.33 
31.33 
10.28 


32 

25.40 
9.639 
3.162 


81 

20.88 
24.40 
8.004 


115 

37.95 
34.64 
11.36 


332 
3281 
100 
3281 


% of Col 


6 


Job outside home No preference 


81 

41.54 
12.56 
8.004 


92 

73.02 
14.26 
9.091 


299 

77.06 
46.36 
29.55 


173 

57.10 
26.82 
17.09 


645 
63.74 
100 
63.74 


No 
Stay home 


81 
20.88 


32 
25.40 


L115 
31895 


104 
53.33 


332 
32.81 


10 
5.128 
28.57 
0.9881 


2 
1.587 
5.714 
0.1976 


8 
2.062 
22.86 
0.7905 


15 

4.950 
42.86 
1.482 


35 
3.458 
100 
3.458 


% of Total 


83.10 


<0.0001 
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Sex and education FIGURE 6.3 
100 + Hd Women no college Segmented bar graph comparing the 

= HS Women college percents of women and no college 
¢ Hl Men no college (orange), women and college (blue), 
a Wi Men college d i d 
o 380- men and no college (green), and men 
> and college (red) among those who 
3 prefer each lifestyle. 
ce) 
i 60 4 
= 
S 
& 
7) 
> 
= 404 
v 
= 
+ 
= 
= 204 
= 
v 
5 

10) a 

Job outside home Stay home No preference 
Preferred lifestyle 


Figure 6.4 is called a mosaic plot and is a variation of a segmented bar graph. _ mosaic plot 
The bars now have different widths, and these widths correspond to the propor- 
tion of people in each of the three preferred lifestyle categories. Thus, the widths 
display the marginal distribution of preferred lifestyle. Each bar is again divided 
(segmented) into four parts, represented by four colors. The upper portion of each 
bar represents the proportion of women with no college among adults who prefer 
each lifestyle. The other portions represents the proportions of each of the other sex 
and education categories. Each bar has a height of 1 because each bar represents 
all the adults in each different group of people. The mosaic plot is more informa- 
tive than the segmented bar plot because it displays the marginal distribution of 


Mosaic Plot 


Women no 
college 


Women 
college 


0.50 
college 


Men 
college 


FIGURE 6.4 

Mosaic plot comparing the 
proportions of women and 
no college (orange), women 
and college (blue), men and 


Proportion of the preferred lifestyle group 


Men no | 


0.00 


Job outside home No Stay home 
preference no college (green), and men and 
Preferred lifestyle college (red) among those who 


prefer each lifestyle. 
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preferred lifestyle as well as the conditional distribution of sex and education given 
preferred lifestyle. 

Both Figures 6.3 and 6.4 display only one of the two sets of conditional distribu- 
tions. We would need another graph to display the other (the conditional distribu- 
tion of preferred lifestyle given sex and education). Also, the graphs only indicate 
percents or proportions, not total counts. 

No single graph (such as a scatterplot) portrays the form of the relationship 
A between categorical variables. No single numerical measure (such as the cor- 
relation) summarizes the strength of the association. Bar graphs are flexible 
enough to be helpful, but you must think about what comparisons you want to 
display. For numerical measures, we rely on well-chosen percents. You must decide 
which percents you need. Here is a hint: If there is an explanatory-response relation- 
ship, compare the conditional distributions of the response variable for the separate values 
of the explanatory variable. If you think that sex and education influences adults’ 
preferred lifestyle, compare the conditional distributions of preferred lifestyle for 
each sex and education category, as in Example 6.3. 


() LaunchPad Online Resources 


* The StatBoards video, Conditional Distributions, provides the details of 
calculating conditional distributions through an example. 


* The StatBoards video, Graphing a Two-Way Table, provides the details of 
creating a mosaic plot through an example. 


* The Video Technology Manuals discuss how to make a segmented bar 
graph using software for several statistical software packages. 


Apply Your Knowledge 


Keith Bedford/The New York 


Times/Redux 


6.3 Video Gaming and Grades. Exercise 6.1 gives data on the grade distribu- 
tion of boys who have and have not played video games. To see the relation- 
ship between grades and game-playing experience, find the conditional distri- 
butions of grades (the response variable) for players and nonplayers. What do 


you conclude? GAMING 


6.4 Ages of College Students. Exercise 6.2 gives U.S. Census Bureau data 
describing the age and sex of all American students enrolled in college. We 
suspect that the percent of women is higher among students in the 25- to 
34-year-old age group than in the 18- to 24-year-old age group. Do the data 
support this suspicion? Follow the four-step process as illustrated in Example 


6.5 Marginal Distributions Aren’t the Whole Story. Here are the row and 
column totals for a two-way table with two rows and two columns: 


a b 50 
c d 50 
60 40 100 


Make up two different sets of counts a, b, c, and d for the body of the table that 
give these same totals. This shows that the relationship between two variables 
cannot be obtained from the two individual distributions of the variables. 


As is the case with quantitative variables, the effects of lurking variables can 
change or even reverse relationships between two categorical variables. Here is 
an example that demonstrates the surprises that can await the unsuspecting user 
of data. 


Accident victims are sometimes taken by helicopter from the accident scene to a 
hospital. Helicopters save time. Do they also save lives? Let’s compare the percents 
of accident victims who die with helicopter evacuation and with the usual transport 
to a hospital by road. Here are hypothetical data that illustrate a practical difficulty:* 


Helicopter Road 
Victim died 64 260 
Victim survived 136 840 
Total 200 1100 


We see that 32% (64 out of 200) of helicopter patients died, but only 24% (260 
out of 1100) of the others did. That seems discouraging. 

The explanation is that the helicopter is sent mostly to serious accidents, so that 
the victims transported by helicopter are more often seriously injured. They are more 
likely to die with or without helicopter evacuation. Here are the same data broken 
down by the seriousness of the accident: 


Serious Accidents Less Serious Accidents 


Helicopter Road Helicopter Road 
Died 48 60 Died 16 200 
Survived 5 40 Survived 84 800 
Total 100 100 Total 100 1000 


Inspect these tables to convince yourself that they describe the same 1300 accident 
victims as the original two-way table. For example, 200 (= 100 + 100) were moved 
by helicopter, and 64 (= 48 + 16) of these died. 

Among victims of serious accidents, the helicopter saves 52% (52 out of 100) 
compared with 40% for road transport. If we look only at less serious accidents, 
84% of those transported by helicopter survive, versus 80% of those transported 
by road. Both groups of victims have a higher survival rate when evacuated by 
helicopter. m 


How can it happen that the helicopter does better for both groups of victims 
but worse when all victims are lumped together? Examining the data makes 
the explanation clear. Half the helicopter transport patients are from serious 
accidents, compared with only 100 of the 1100 road transport patients. So the 
helicopter carries patients who are more likely to die. The seriousness of the acci- 
dent was a lurking variable that, until we uncovered it, hid the true relationship 
between survival and mode of transport to a hospital. Example 6.4 illustrates 
Simpson’s paradox. 


6.3 Simpson’s Paradox 
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Simpson’s Paradox 


An association or comparison that holds for all of several groups can reverse direction 
when the data are combined to forma single group. This reversal is called Simpson’s 
paradox. 


The lurking variable in Simpson’s paradox is categorical. That is, it breaks the 
individuals into groups, as when accident victims are classified as injured in a 
“serious accident” or a “less serious accident.” Simpson's paradox is just an extreme 
form of the fact that observed associations can be misleading when there are lurk- 
ing variables. 


Apply Your Knowledge 


6.6 Field Goal Shooting. Here are data on field goal shooting for two members 
of the Benedict College 2012-13 men’s basketball team:° BSKTBALL 


Alex Brailsford Rickie Jackson 

Made Missed Made Missed 
Two-pointers 115) i155 30 29 
Three-pointers , 15 65 130 


(a) What percent of all field goal attempts did Alex Brailsford make? What 
percent of all field goal attempts did Rickie Jackson make? 


(b) Now find the percent of all two-point field goals and all three-point field 
goals that Alex made. Do the same for Rickie. 


(c) Alex had a lower percent for both types of field goals, but had a better 
overall percent. That sounds impossible. Explain carefully, referring to 
the data, how this can happen. 


6.7. Bias in the Jury Pool? The New Zealand Department of Justice did a 
study of the composition of juries in court cases. Of interest was whether 
Maori, the indigenous people of New Zealand, were adequately represented 
in jury pools. Here are the results for two districts, Rotorua and Nelson, in 
New Zealand (similar results were found in all districts) :° JURY 


= Rotorua Nelson 

3 Maori Non-Maori Maori Non-Maori 

F In jury pool 79 258 In jury pool 1 56 

2 Not in jury pool | 8810 23,751 Not in jury pool | 1328 32,602 
Total 8889 24,009 Total 1329 32,658 


(a) Compare percents to show that the percent of all Maori in the jury pool 
in each district is less than the percent of non-Maori in the jury pool. 


(b) Combine the data into a single two-way table of outcome (“in jury pool” 
or “not in jury pool”) by ethnicity (Maori or non-Maori). The origi- 
nal study only reported such an overall rate. Which ethnic group has a 
higher percent of its people in the jury pool? 

(c) Explain from the data, in language that a reporter can understand, how 
Maori can have a higher percent overall even though non-Maori have 
higher percents for both districts. 


CHAPTER 6 SUMMARY 


Chapter Specifics 


m A two-way table of counts organizes data about two categorical variables. Values of 
the row variable label the rows that run across the table, and values of the column 
variable label the columns that run down the table. Two-way tables are often used to 
summarize large amounts of information by grouping outcomes into categories. 


gm The row totals and column totals in a two-way table give the marginal distributions 
of the two individual variables. It is clearer to present these distributions as percents of 
the table total. Marginal distributions tell us nothing about the relationship between 
the variables. 


m There are two sets of conditional distributions for a two-way table: the distributions 
of the row variable for each fixed value of the column variable and the distributions 
of the column variable for each fixed value of the row variable. Comparing one set of 
conditional distributions is one way to describe the association between the row and 
the column variables. 


m To find the conditional distribution of the row variable for one specific value of the 
column variable, look only at that one column in the table. Find each entry in the 
column as a percent of the column total. 


= Bar graphs are a flexible means of presenting categorical data. There is no single best 
way to describe an association between two categorical variables. 


= A comparison between two variables that holds for each individual value of a third 
variable can be changed or even reversed when the data for all values of the third vari- 
able are combined. This is Simpson’s paradox. Simpson’s paradox is an example of the 
effect of lurking variables on an observed association. 


Link It 


In Chapters 4 and 5 we considered relationships between two quantitative variables. In this 
chapter we use two-way tables to describe relationships between two categorical variables. 
To explore relationships between two categorical variables, we examine their conditional 
distributions. Changes in the pattern of the conditional distribution of one variable as the 
value of the other varies provides information about the relationship between the variables. 
No change in this pattern suggests that there is no relationship. 

As in Chapters 4 and 5, we must be careful not to assume that the patterns we observe 
would continue to hold for additional data or in a broader setting. Simpson’s paradox is an 
example of how such an assumption could mislead us. 

An important question is whether a pattern we observe while exploring a given set of data 
holds for values of our variables that we have not observed—in other words, that additional 
data would continue to conform to these patterns. We will begin to answer this question in 
Part III. And in Chapter 25 we will learn the answer to this question for two-way tables. 


() LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


* StatTutor starts with a video review of each section and asks a series of ques- 
tions to check your understanding. You may find this helpful if you need addi- 
tional help understanding how to calculate marginal distributions or conditional 
distributions, or if you want to see more examples of Simpson’s paradox. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 
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CHECK YOUR SKILLS 


The Pew Internet and American Life Project interviewed several 
hundred teens (aged 12 to 17). One question asked was, “How 
often do you take your cell phone to school?” Here is a two-way 
table of the responses by how permissive the school is with regard 
to cell phone use:’ 


ALLOW IN 
SCHOOL BUT ALLOW 
FREQUENCY FORBID NOTIN CLASS IN CLASS 
Never NS 19 4 
Less often 14 31 6 
At least several times per week} 14 23 8 
Every day 97 314 57 


Exercises 6.8 to 6.16 are based on this table. 


6.8 


6.9 


6.11 


6.12 


6.13 


| PHONE 


How many individuals are described by this table? 
(a) 468 (b) 612 
How many teens from schools that forbid cell phones 
were among the respondents? 

(a) 48 (b) 150 
The percent of teens from schools that forbid cell 
phones among the respondents was 

(a) about 8%. 

(b) about 25%. 

(c) about 48%. 


Your percent from Exercise 6.10 is part of 


(c) Need more information 


(c) Need more information 


(a) the marginal distribution of school permissiveness. 
(b) the marginal distribution of the frequency at 
which a teen brought a cell phone to school. 

(c) the conditional distribution of the frequency at 
which a teen brought a cell phone to school among 
schools with a given level of permissiveness. 


What percent of teens from schools that forbid cell 
phones brought their cell phone to school every day? 


(a) about 16% (b) about 21% (c) about 65% 
Your percent from Exercise 6.12 is part of 


(a) the marginal distribution of the frequency at 
which a teen brought a cell phone to school. 

(b) the conditional distribution of school permissive- 
ness among those who brought a cell phone to school 
every day. 


6.14 


6.15 


6.16 


6.17 


(c) the conditional distribution of the frequency at 
which a teen brought a cell phone to school among 
schools that forbid cell phones. 


What percent of those who brought their cell phone 
to school every day were from schools that forbid cell 
phones? 


(a) about 16% (b) about 21% (c) about 65% 
Your percent from Exercise 6.14 is part of 


(a) the marginal distribution of the frequency at 
which a teen brought a cell phone to school. 

(b) the conditional distribution of school permissive- 
ness among those who brought a cell phone to school 
every day. 

(c) the conditional distribution of the frequency at 
which a teen brought a cell phone to school among 
schools with a given level of permissiveness. 


A bar graph showing the conditional distribution of 
the frequency at which a teen brought a cell phone to 
school among schools with a given level of permissive- 
ness would have 


(a) 3 bars. (b) 4 bars. (c) 12 bars. 


A college looks at the grade point average (GPA) 
of its full-time and part-time students. Grades in 
science courses are generally lower than grades in 
other courses. There are few science majors among 
part-time students, but many science majors among 
full-time students. The college finds that full-time 
students who are science majors have higher GPAs 
than part-time students who are science majors. 
Full-time students who are not science majors also 
have higher GPAs than part-time students who 
are not science majors. Yet part-time students as 
a group have higher GPAs than full-time students. 
This finding is 

(a) not possible: if both science and other majors who 
are full-time have higher GPAs than those who are 
part-time, then all full-time students together must 
have higher GPAs than all part-time students together. 
(b) an example of Simpson’s paradox: full-time 
students do better in both kinds of courses but worse 
overall because they take more science courses. 

(c) due to comparing two conditional distributions 
that should not be compared. 
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6.18 


Is astrology scientific? The University of Chicago's 
General Social Survey (GSS) is the nation’s most 
important social science sample survey. The GSS 
asked a random sample of adults their opinion about 
whether astrology is very scientific, sort of scientific, or 
not at all scientific. Here is a two-way table of counts 
for people in the sample who had three levels of higher 
education degrees:* £ijj,) AsTRLGY 


DEGREE HELD 


JUNIOR COLLEGE BACHELOR GRADUATE 


Not at all scientific 


44 122 71 


Very or sort of scientific 


6.19 


Jl 62 27 


Find the two conditional distributions of degree held: 
one for those who hold the opinion that astrology is 
not at all scientific, and one for those who say astrology 
is very or sort of scientific. Based on your calculations, 
describe with a graph and in words the differences 
between those who say astrology is not at all scientific 
and those who say it is very or sort of scientific. 

Weight-lifting injuries. Resistance training is a popular 
form of conditioning aimed at enhancing sports perfor- 
mance, and is widely used among high school, college, 
and professional athletes, although its use for younger 
athletes is controversial. A random sample of 4111 
patients between the ages of 8 and 30 admitted to U.S. 
emergency rooms with the injury code “weightlifting” 
was obtained. These injuries were classified as “acciden- 
tal” if caused by dropped weight or improper equipment 
use. The patients were also classified into the four age 
categories 8 to 13 years, 14 to 18, 19 to 22, and 23 to 30. 


Here is a two-way table of the results:? jj) Lirtina 
AGE ACCIDENTAL NOT ACCIDENTAL 
8-13 295 102 
14-18 655 916 
19-22 239) 335) 
23-30 363 1008 


Compare the distributions of ages for accidental and 
nonaccidental injuries. Use percents and draw a bar 
graph. What do you conclude? 


Marital status and income. We sometimes hear that getting mar- 
ried is good for your career. Table 6.2 presents data from the U.S. 
Census Bureau that classifies men, ages 45-64, according to mari- 
tal status and annual income in 2011-12. We include only data on 
men, ages 45-64, to avoid sex bias and reduce the effect of age.!° 
Exercises 6.20 to 6.24 are based on these data. 


6.20 


6.21 


6.22 


6.23 


6.24 


Marginal distributions. Give (in percents) the two 
marginal distributions, for marital status and for 
income. Do each of your two sets of percents add to 
exactly 100%? If not, why not? STATUS 


Percents. What percent of single men have no 
income? What percent of men with no income are 
single men? |) status 


Conditional distribution. Give (in percents) the 
conditional distribution of income level among single 
men. Should your percents add to 100% (up to round- 
off error)? Explain your reasoning. STATUS 


Marital status and income. One way to see the rela- 
tionship is to look at who has no income. fj),) 


(a) There are 995,000 married men with no income, 
and 513,000 single men with no income. Explain 
why these counts by themselves don’t describe the 
relationship between marital status and job grade. 
(b) Find the percent of men in each marital status 
group who have no income. Then find the percent 
in each marital group who have an income of 
$100,000 and over. What do these percents say 
about the relationship? 


Association is not causation. The data in Table 6.2 
show that single men are more likely to hold lower- 
income jobs than are married men. We should not 
conclude that single men can increase their income 
by getting married. What lurking variables might help 


MARITAL STATUS 
SINGLE 
INCOME (NEVER MARRIED) MARRIED DIVORCED WIDOWED TOTAL 
No income Dil 995 385 25 1,918 
$1-$49,999 8323 13,478 3,678 453 20,932 
$50,000-$99,999 814 8,492 1,316 128 10,750 
$100,000 and over 288 5,167 544 49 6,048 
Total 4,938 28,132 5,923 655 39,648 
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explain the association between marital status and 
income? 


Race and the death penalty. Whether a convicted 
murderer gets the death penalty seems to be influ- 
enced by the race of the victim. Several researchers 
studied this issue in the 1970s and 1980s, resulting 
in several landmark, oft-cited, and controversial 
papers. Here are data on 326 cases in which the 
defendant was convicted of murder from one of 
these studies:"’ jy) piscrim 


WHITE DEFENDANT BLACK DEFENDANT 


WHITE BLACK WHITE BLACK 

VICTIM | VICTIM VICTIM | VICTIM 
Death 19 0 Death 11 6 
Not (322 9 Not 52 97 


(a) Use these data to make a two-way table of 
defendant’s race (white or black) versus death 
penalty (yes or no). 

(b) Show that Simpson’s paradox holds: a higher 
percent of white defendants are sentenced to 
death overall, but for both black and white victims 
a higher percent of black defendants are sentenced 
to death. 

(c) Use the data to explain why the paradox holds, in 
language that a judge could understand. 


Obesity and health. To estimate the health risks of 
obesity, we might compare how long obese and non- 
obese people live. Smoking is a lurking variable that 
may reduce the gap between the two groups because 
smoking tends to both reduce weight and lead to ear- 
lier death. So if we ignore smoking, we may underes- 
timate the health risks of obesity. Illustrate Simpson’s 
paradox by a simplified version of this situation: make 
up two-way tables of obese (yes or no) by early death 
(yes or no) separately for smokers and nonsmokers 
such that 


Obese smokers and obese nonsmokers are both more 
likely to die earlier than those who are not obese. 
But when smokers and nonsmokers are combined 
into a two-way table of obese by early death, per- 
sons who are not obese are more likely to die earlier 
because more of them are smokers. 


known as Wellbutrin 
or Zyban) and a 
placebo. Chantix is 
different from most 
other quit-smoking 
products in that 
it targets nicotine 
receptors in the 
brain, attaches to 
them, and_ blocks 
nicotine from reach- 
ing them, whereas 
bupropion is an 
antidepressant often 
used to help people 
stop smoking. Gen- 
erally healthy smok- 
ers who smoked at least 10 cigarettes per day were 
assigned at random to take Chantix (n = 352), bupro- 
pion (n = 329), or a placebo (n = 344). The response 
measure is continuous cessation from smoking for 
Weeks 9 through 12 of the study. Here is a two-way 
table of the results:'* Ey, 


alibutrin XL 


Joe Raedle/Getty Images 


TREATMENT 
CHANTIX BUPROPION PLACEBO 
No smoking in Weeks 9-12 155 97 61 
Smoked in Weeks 9-12 197 232 283 
6.28 Animal testing. “It is right to use animals for medi- 


cal testing if it might save human lives.” The Gen- 
eral Social Survey asked 1152 adults to react to 
this statement. Here is the two-way table of their 


responses: fj] ANTEST 
RESPONSE MALE FEMALE 
Strongly agree 76 59 
Agree 270 247 
Neither agree nor disagree 87 139 
Disagree 61 123) 
Strongly disagree 22 68 


How do the distributions of opinion differ between 
men and women? 


The following exercises ask you to answer questions from data with- 
out having the details outlined for you. The exercise statements give 
you the State step of the four-step process. In your work, follow the 
Plan, Solve, and Conclude steps of the process as illustrated in 
Example 6.3. 


6.29 College degrees. “Colleges and universities across 
the country are grappling with the case of the mys- 
teriously vanishing male.” So said an article in the 
Washington Post. Here are projections of the num- 
bers of degrees that will be earned in 2021-22, as 
projected by the National Center for Education 
Statistics. The table entries are counts of degrees in 


thousands.’* £jy)| DEGREES 


6.27 Smoking cessation. A large randomized trial was con- 
4 ducted to assess the efficacy of Chantix for smoking 


cessation compared with bupropion (more commonly 


6.30 


DEGREE FEMALE MALE 
Associate’s 646 383 
Bachelor's 1160 844 
Master’s 576 354 
Professional or Doctor’s 106 92 


Briefly contrast the counts and distributions of men 
and women in earning degrees. Are men projected to 
be “vanishing” from colleges and universities across 
the country? 


Complications of bariatric surgery. Bariatric surgery, 
or weight-loss surgery, includes a variety of procedures 
performed on people who are obese. Weight loss is 
achieved by reducing the size of the stomach with an 
implanted medical device (gastric banding), by remov- 
ing a portion of the stomach (sleeve gastrectomy), or by 
resecting and rerouting the small intestines to a small 
stomach pouch (gastric bypass surgery). Because there 
can be complications using any of these methods, the 
National Institute of Health recommends bariatric sur- 
gery for obese people with a body mass index (BMI) 
of at least 40, and for people with BMI 35 and serious 
coexisting medical conditions such as diabetes. Serious 
complications include potentially life-threatening, per- 
manently disabling, and fatal outcomes. Here is a two- 
way table for data collected in Michigan over several 
years, giving counts of non-life-threatening complica- 
tions, serious complications, and no complications for 
these three types of surgeries:'4 BARI 


TYPE OF COMPLICATION 
NON-LIFE- 
THREATENING SERIOUS NONE | TOTAL 
Gastric banding 81 46 5253 | 5380 
Sleeve gastrectomy Bil 19 804 854 
Gastric bypass 606 B25 8110 | 9041 
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What do the data say about differences in complica- 
tions for the three types of surgeries? 


Smokers rate their health. The University of Michi- 
gan Health and Retirement Study (HRS) surveys 
more than 22,000 Americans over the age of 50 every 
two years. A subsample of the HRS participated in a 
2009 Internet-based survey that collected informa- 
tion on a number of topical areas, including health 
(physical and mental health behaviors), psychosocial 
items, economics (income, assets, expectations, and 
consumption), and retirement.’ Two of the questions 
asked were, “Would you say your health is excellent, 
very good, good, fair, or poor?” and “Do you smoke 
cigarettes now?” The two-way table summarizes the 
answers on these two questions. SMRATE 
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CURRENT SMOKER 
HEALTH YES NO 
Excellent D5) 484 
Very good WS 1557 
Good 145 1309 
Fair 90 545 
Poor 29 Ih 


What do the data say about differences in self- 
evaluation of health for current smokers and non- 
smokers? 


Punxsutawney Phil. In the United States, Ground- 
hog Day is celebrated on February 2. On February 2, 
Punxsutawney Phil, a mythical groundhog in Punx- 
sutawney, PA, emerges from his home and if he sees 
his shadow and returns to his hole, he has predicted 
six more weeks of winter-like weather. How accu- 
rate is Phil? We have data for 113 years (up to 2013) 
indicating whether Phil saw his shadow and average 
temperature in March. It is not clear what consti- 
tutes “six more weeks of winter-like weather,” so here 
we define it to occur if the average temperature for 
March is not above the historical average. The results 
are: 52 years where Phil saw his shadow and the aver- 
age March temperature was not above the historical 
average, 46 years where Phil saw his shadow and the 
average March temperature was above the historical 
average, 5 years where Phil did not see his shadow and 
the average March temperature was not above the his- 
torical average, and 10 years where Phil did not see 
his shadow and the average March temperature was 
above the historical average.!° 


(a) Make a two-way table of “Phil saw his shadow or 
not” against “above historical average temperatures in 
March or not.” 

(b) What do the data tell us about Phil as a weather 


forecaster? 


Sleep quality. A random sample of 871 students 
between the ages of 20 and 24 at a large midwestern 
university completed a survey including questions 
about their sleep quality, moods, academic perfor- 
mance, physical health, and psychoactive drug use. 
Sleep quality was measured using the Pittsburgh 
Sleep Quality Index (PSQI), with students scoring 
less than or equal to 5 on the index classified as opti- 
mal sleepers, those scoring a 6 or 7 classified as bor- 
derline, and those scoring over 7 classified as poor 
sleepers. The following table looks at the relationship 
between sleep quality classification and the use of 
over-the-counter (OTC) or prescription (Rx) stim- 
ulant medication more than once a month to help 
keep awake.’ £yy)) sLeerq 
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USE OF OTC/RX MEDS SLEEP QUALITY ON PSQI INDEX What do the data say about differences in sleep 
TO WAKE > 1X/MONTH | OPTIMAL BORDERLINE POOR quality for those who use over-the-counter or pre- 
Yee 37 53 84 scription stimulant medication medication more 
No 526 Re a5 than once a month to keep awake, and those who 


don’t? 
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Promoting women. In academics, faculty typically start as assistant professors, 
are promoted to associate professor (and gain tenure), and finally reach the rank of 
full professor. Some have argued that women have a harder time gaining promotion 
to associate and full professor than do men. Do data support this argument? Search 
the web to find the number of faculty by rank and gender at some university. Do 
you see a pattern that suggests that the proportion of women decreases as rank 
increases? We found several sources of data by doing a Google search on “faculty 
head count by rank and gender.” In addition to discussing the pattern you find, 
provide the data, the name of the school, and the source of the data. 


Accidental deaths and age. Accidental deaths are shocking and tragic. Do the 
ways in which people die by accident change with age? Look at the most recent 
Statistical Abstract of the United States (www.census.gov/compendia/ 
statab/) and make a two-way table that provides the counts of deaths due 
to accidents from various causes for three different age groups. What do you 
conclude? 


Simpson’s paradox. Find an example of Simpson’s paradox and discuss 
how your example illustrates the paradox. Two examples that we found (thanks 
to Patricia Humphrey at Georgia Southern University) are www.nytimes. 
com/2006/07/15/education/15report .html and online.wsj.com/ 
article/SB125970744553071829.html. 


ata analysis is the art of describing data using graphs and numerical sum- 

maries. The purpose of exploratory data analysis is to help us see and under- 

stand the most important features of a set of data. Chapter 1 commented on 
graphs to display distributions: pie charts and bar graphs for categorical variables, 
histograms and stemplots for quantitative variables. In addition, time plots show 
how a quantitative variable changes over time. Chapter 2 presented numerical tools 
for describing the center and variability of the distribution of one variable. Chapter 
3 discussed density curves for describing the overall pattern of a distribution, with 
emphasis on the Normal distributions. 

The first STATISTICS IN SUMMARY figure on the next page organizes the big 
ideas for exploring a quantitative variable. Plot your data, then describe their center 
and variability using either the mean and standard deviation or the five-number 
summary. The last step, which makes sense only for some data, is to summarize 
the data in compact form by using a Normal curve as a description of the overall 
pattern. The question marks at the last two stages remind us that the usefulness of 
numerical summaries and Normal distributions depends on what we find when we 
examine graphs of our data. No short summary does justice to irregular shapes or 
to data with several distinct clusters. 


Part | Summary 
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Analyzing Data for One Variable 

Plot your data: 

Stemplot, histogram 

| Interpret what you see: 
Shape, center, variability, outliers 
| Numerical summary? 
x ands, five-number summary? 
| Density curve? 
Normal distribution? 


Chapters 4 and 5 applied the same ideas to relationships between two quantita- 
tive variables. The second STATISTICS IN SUMMARY figure retraces the big 
ideas, with details that fit the new setting. Always begin by making graphs of your 
data. In the case of a scatterplot, we have learned a numerical summary only for 
data that show a roughly linear pattern on the scatterplot. The summary is then 
the means and standard deviations of the two variables and their correlation. A 
regression line drawn on the plot gives a compact description of the overall pattern 
that we can use for prediction. Once again there are question marks at the last two 
stages to remind us that correlation and regression describe only straight-line rela- 
tionships. Chapter 6 shows how to understand relationships between two categori- 
cal variables; comparing well-chosen percents is the key. 

You can organize your work in any open-ended data analysis setting by following 
the four-step State, Plan, Solve, and Conclude process first introduced in Chapter 2. 
After you have mastered the extra background needed for statistical inference, this 
process will also guide practical work on inference later in the book. 


Analyzing Data for Two Variables 


Plot your data: 
Scatterplot 


| Interpret what you see: 
Direction, form, strength. 


Linear? 


| Numerical summary? 
X, Yi Sy Sy and 2 
— Regression line? | 


Here are the most important skills you should have acquired from reading 
Chapters | to 6. 
A. Data 
1. Identify the individuals and variables in a set of data. 


2. Identify each variable as categorical or quantitative. Identify the units in 
which each quantitative variable is measured. 


3. Identify the explanatory and response variables in situations where one 
variable explains or influences another. 
B. Displaying Distributions 
1. Recognize when a pie chart can and cannot be used. 


2. Make a bar graph of the distribution of a categorical variable or, in general, 
to compare related quantities. 


3. Interpret pie charts and bar graphs. 
4, Make a histogram of the distribution of a quantitative variable. 


5. Make a stemplot of the distribution of a small set of observations. Round 
leaves or split stems as needed to make an effective stemplot. 


6. Make a time plot of a quantitative variable over time. Recognize patterns 
such as trends and cycles in time plots. 
C. Describing Distributions (Quantitative Variable) 
1. Look for the overall pattern and for major deviations from the pattern. 


2. Assess from a histogram or stemplot whether the shape of a distribution is 
roughly symmetric, distinctly skewed, or neither. Assess whether the distri- 
bution has one or more major peaks. 

3. Describe the overall pattern by giving numerical measures of center and 
variability in addition to a verbal description of shape. 

4. Decide which measures of center and variability are more appropriate: the 
mean and standard deviation (especially for symmetric distributions) or the 
five-number summary (especially for skewed distributions). 


5. Recognize outliers and give plausible explanations for them. 


D. Numerical Summaries of Distributions 
1. Find the median M and the quartiles Q,; and Q; for a set of observations. 


2. Find the five-number summary and draw a boxplot; assess center, variability, 
symmetry, and skewness from a boxplot. 

3. Find the mean x and the standard deviation s for a set of observations. 

4. Understand that the median is more resistant than the mean. Recognize 
that skewness in a distribution moves the mean away from the median 
toward the long tail. 

5. Know the basic properties of the standard deviation: s = 0 always; s = 0 
only when all observations are identical and increases as the variability 
increases; s has the same units as the original measurements; s is pulled 
strongly up by outliers or skewness. 


E. Density Curves and Normal Distributions 


1. Know that areas under a density curve represent proportions of all observa- 
tions and that the total area under a density curve is 1. 
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Approximately locate the median (equal-areas point) and the mean (bal- 
ance point) on a density curve. 

Know that the mean and median both lie at the center of a symmetric 
density curve and that the mean moves farther toward the long tail of a 
skewed curve. 


Recognize the shape of Normal curves and estimate by eye both the mean 
and standard deviation from such a curve. 


Use the 68—95—99.7 rule and symmetry to state what percent of the obser- 
vations from a Normal distribution fall between two points when both 
points lie at the mean or one, two, or three standard deviations on either 
side of the mean. 


Find the standardized value ( z-score) of an observation. Interpret z-scores 
and understand that any Normal distribution becomes the standard 
Normal N(0, 1) distribution when standardized. 

Given that a variable has a Normal distribution with a stated mean pw and 
standard deviation a, calculate the proportion of values above a stated 
number, below a stated number, or between two stated numbers. 

Given that a variable has a Normal distribution with a stated mean pw and 
standard deviation oa, calculate the point having a stated proportion of all 
values above it or below it. 


Scatterplots and Correlation 


IL 


Make a scatterplot to display the relationship between two quantitative 
variables measured on the same subjects. Place the explanatory variable (if 
any) on the horizontal scale of the plot. 


Add a categorical variable to a scatterplot by using a different plotting 
symbol or color. 


Describe the direction, form, and strength of the overall pattern of a scat- 
terplot. In particular, recognize positive or negative association and linear 
(straight-line) patterns. Recognize outliers in a scatterplot. 

Judge whether it is appropriate to use correlation to describe the relation- 
ship between two quantitative variables. Find the correlation r. 


Know the basic properties of correlation: r measures the direction and 
strength of only straight-line relationships; r is always a number between 
—1 and 1; r= +1 only for perfect straight-line relationships; r moves 
away from 0 toward +1 as the straight-line relationship gets stronger. 


Regression Lines 


iL 


Understand that regression requires an explanatory variable and a response 
variable. Correctly identifying which variable is the explanatory variable 
and which is the response variable is important. Switching these will result 
in different regression lines. Use a calculator or software to find the least- 
squares regression line of a response variable y on an explanatory variable 
x from data. 


Explain what the slope b and the intercept a mean in the equation 
y =a + bx of a regression line. 


Draw a graph of a regression line when you are given its equation. 


Use a regression line to predict y for a given x. Recognize extrapolation and 
be aware of its dangers. 


Find the slope and intercept of the least-squares regression line from the 
means and standard deviations of x and y and their correlation. 


Use r’, the square of the correlation, to describe how much of the variation 
in one variable can be accounted for by a straight-line relationship with 
another variable. 

Recognize outliers and potentially influential observations from a scatter- 
plot with the regression line drawn on it. 

Calculate the residuals and plot them against the explanatory variable x. 


Recognize that a residual plot magnifies the pattern of the scatterplot of y 
versus x. 


Cautions about Correlation and Regression 


ik. 


Understand that both r and the least-squares regression line can be strongly 
influenced by a few extreme observations. 

Recognize possible lurking variables that may explain the observed associa- 
tion between two variables x and y. 

Understand that even a strong correlation does not mean there is a cause- 
and-effect relationship between x and y. 

Give plausible explanations for an observed association between two vari- 
ables: direct cause and effect, the influence of lurking variables, or both. 


Categorical Data 
1. 


From a two-way table of counts, find the marginal distributions of both 
variables by obtaining the row sums and column sums. 

Express any distribution in percents by dividing the category counts by 
their total. 

Describe the relationship between two categorical variables by comput- 
ing and comparing percents. Often this involves comparing the condi- 
tional distributions of one variable for the different categories of the 
other variable. 

Recognize Simpson’s paradox and be able to explain it. 


TERY 


The questions below include multiple-choice, calculations, and short-answer questions. They will 
help you review the basic ideas and skills presented in Chapters 1 to 6. 


7.1 


7.2 


As part of a database on new births at a hospital, some variables recorded are the 
age of the mother, marital status of the mother (single, married, divorced, other), 
weight of the baby, and sex of the baby. Of these variables 


(a) age, marital status, and weight are quantitative variables. 
(b) age and weight are categorical variables. 

(c) sex and marital status are categorical variables. 

(d) sex, marital status, and age are categorical variables. 


You are interested in obtaining information about the performance of students 
in your statistics class and seeing how this performance is affected by several 
factors such as sex. To do this, you are going to give a questionnaire to all 
students in the class. Give two questions for which the response is categorical 
and two questions for which the response is quantitative. For the categorical 
variables, give the possible values; for the quantitative variables, give the unit 
of measurement. 
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DRIVING IN 


cH CANADA 


Canada is a civilized and restrained 
nation, at least in the eyes of 
Americans. A survey sponsored by 
the Canada Safety Council suggests 
that driving in Canada may be 
more adventurous than expected. 
Of the Canadian drivers surveyed, 
88% admitted to aggressive driving 
in the past year, and 76% said 

that sleep-deprived drivers were 
common on Canadian roads. What 
really alarms us is the name of 

the survey: the Nerves of Steel 
Aggressive Driving Study. 
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FIGURE 7.1 

Histogram of the number of seeds 
produced by velvetleaf plants when no 
herbicide was used, for Questions 7.3 
LORS: 
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Weeds among the corn. Velvetleaf is a particularly annoying 
weed in corn fields. It produces lots of seeds, and the seeds wait in the 
soil for years until conditions are right. How many seeds do velvetleaf 
plants produce? Figure 7.1 is a histogram of the number of seeds pro- 
duced from 28 velvetleaf plants that sprouted in a corn field where 
no herbicide was used.! Use this histogram to answer Questions 7.3 


TON: 


7.3. The histogram 


(a) is skewed right. (b) has outliers. 
(c) is asymmetric. (d) is all of the above. 


The median number of seeds produced is 


(a) under 1000. (b) between 1000 and 2000. 
(c) between 2000 and 3000. (d) over 3000. 

The percent of plants that produced over 2000 seeds is 

(a) about 15%. (b) about 25%. 

(c) about 35%. (d) about 45%. 


A reporter wishes to portray baseball players as overpaid. Which measure of center 
should he report as the average salary of major league players? 

(a) The mean 

(b) The median 

(c) Either the mean or the median. It doesn’t matter since they will be equal. 

(d) Neither the mean nor the median. Both will be much lower than the actual 
average salary. 


El Nifio and the monsoon. The earth is interconnected. For example, it appears that 
EI Nifio, the periodic warming of the Pacific Ocean west of South America, affects the mon- 
soon rains that are essential for agriculture in India. Here are the monsoon rains (in millime- 
ters) for the 23 strong El Nifio years between 1871 and 2004, arranged in increasing order:? 


604 628 651 653 669 698 710 717 736 740 781 784 
790 790 792 806 811 830 858 858 872 896 957 


Use these data to answer Questions 7.7 to 7.9. 


7.7 


7.8 


7.9 


7.11 


What is the median amount of rainfall for the strong El Nifio years? 


(a) 669 millimeters (b) 698 millimeters 
(c) 784 millimeters (d) 830 millimeters 
What is the first quartile for these data? 

(a) 669 millimeters (b) 698 millimeters 
(c) 784 millimeters (d) 830 millimeters 


The average monsoon rainfall for all years from 1871 to 2004 is about 850 millime- 
ters. What effect does El Nifio appear to have on monsoon rains? 


(a) Strong El Nifio years tend to have higher monsoon rainfalls than in other 
years. 

(b) Strong El Nifio years tend to have the same monsoon rainfalls as in other years. 
(c) Strong El Nifio years tend to have lower monsoon rainfalls than in other 
years. 

(d) None of the above. 


Which of the following is likely to have a mean that is smaller than the median? 


(a) The salaries of all National Football League players 

(b) The scores of students (out of 100 points) on a very easy exam in which most 
students score perfectly, but a few do very poorly 

(c) The prices of homes in a large city 

(d) The scores of students (out of 100 points) on a very difficult exam in which 
most students score poorly, but a few do very well 


For a biology project, you measure the tail length in centimeters and weight in 
grams of 12 mice of the same variety. What units of measurement do each of the 
following have? 


(a) The mean tail length 

b) The first quartile of the tail lengths 

c) The standard deviation of the tail lengths 
d) The variance of the weights 


( 
( 
( 


Travel times. How long must you travel each day to get to work? Here is a stemplot of the aver- 
age travel times to work for workers in the 50 states and the District of Columbia who are at least 
16 years of age and don’t work at home.* The stems are whole minutes, and the leaves are tenths 
of a minute. Use the stemplot to answer Questions 7.12 and 7.13. 
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FIGURE 7.2 

Bar graph of the distribution of search 
engine use for the most frequent use 
categories in 2012, for Questions 7.14 
and 7.15. 


7.12 


7.13 


The shape of the distribution is 
(a) clearly skewed to the right. 
(b) roughly symmetric. 

(c) clearly skewed to the left. 


The percent of states for which the average travel time is 20 minutes or more is 
closest to 


(a) 16%. (b) 27%. (c) 73%. (d) 84%. 


Search engine use. In 2012, approximately 91% of Internet users used a search engine to find 
information on the web. Search engine users were asked, “How often do you use a search engine to 
find information online?” The bar graph in Figure 7.2 gives the percentages for the most frequent 
use categories.* Use the bar graph to help answer Questions 7.14 and 7.15. 


7.14 


7.15 
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Approximately what percent of search engine users used a search engine one to two 
days per week? 


(a) 5% (b) 10% (c) 15% (d) 20% 


The total of the percents of the bars in the graph is 92%. Which of the following 
statements is true? 


(a) The percent of search engine users who use a search engine less than once every 
few weeks is greater than 10%. 

(b) A pie chart could be drawn for the categories given in the bar graph. 

(c) The percent of search engine users who use a search engine at least three to five 
days a week is about 70%. 

(d) None of the above. 


Reports on a student’s ACT, SAT, or MCAT usually give the percentile as well 
as the actual score. The percentile is just the cumulative proportion stated as a 
percent: the percent of all scores that were lower than this one. In 2013, the total 


7.17 


MCAT scores were close to Normal with mean 25.3 and standard deviation 6.5.° 
William scored 32. What was his percentile? 


The length of the thorax in a population of male fruit flies is approximately Normal 
with mean 0.800 millimeters (mm) and standard deviation 0.078 mm. Use the 
68—95-99.7 rule to answer the following questions. 


(a) What range of thorax lengths covers almost all (99.7%) of this distribution? 
(b) What percent of male fruit flies have a thorax length exceeding 0.878 mm? 


A professor knows from past experience that the time for students to complete 
a quiz is normally distributed with mean 21 minutes and standard deviation 
3 minutes. 


(a) If he allows 25 minutes for the quiz, what percent of the students will not com- 
plete the quiz? 

(b) Suppose that he wants to allow sufficient time so that 95% of the students 
will complete the quiz in the allotted time. How much time should he allow for 
the quiz? 

The Aleppo pine and the Torrey pine are widely planted as ornamental trees in 
Southern California. Here are the lengths (in centimeters) of 15 Aleppo pine 
needles:° 


MOA” Tere Tele) 25k UAL MOS) eee IUiNs) Geis) teks) Jc} thf 2X0) HO) ez 


(a) Find the five-number summary for the distribution of Aleppo pine needles. 


Figure 7.3 gives a boxplot for the distribution of the lengths (in centimeters) of 
18 Torrey pine needles. Use this information to help answer the remainder of this 
question. 


(b) The median of the distribution of Torrey pine needles is closest to which of the 
following values? 


24 25 27 30 


(c) Twenty-five percent of the Torrey pine needles exceed what value? 
(d) Given only the length of a needle, do you think you could say which pine spe- 
cies it comes from? Explain briefly. 


34 
32 
30 
28 


26 


Length (centimeters) 


24 


WD) 


20 
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pistiS 
™ BEER IN SOUTH 
ltd 


DAKOTA 
Take a break from doing exercises 
to apply your math to beer cans in 
South Dakota. A newspaper there 
reported that, every year, an average 
of 650 beer cans per mile are tossed 
onto the state’s highways. South 
Dakota has about 83,000 miles of 
roads. How many beer cans is that in 
all? The U.S. Census Bureau says that 
there are about 810,000 people in 
South Dakota. How many beer cans 
does each man, woman, and child in 
the state toss on the road each year? 
That’s pretty impressive. Maybe the 
paper got its numbers wrong. 


FIGURE 7.3 

Boxplot for the distribution of the 
lengths (in centimeters) of 18 Torrey 
pine needles, for Question 7.19. 
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FIGURE 7.4 


Scatterplot of the points earned in the 
2011-12 NHL regular season against 
spending on player salaries for Question 


20) 
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TEAM SPENDING POINTS | TEAM SPENDING POINTS | TEAM SPENDING POINTS 
Flyers 66.9 103 Rangers 61.9 109 Panthers 5S) 94 
Capitals 66.4 92 Devils 61.7 102 Coyotes Daal 97 
Sabres 65.3 89 Sharks 61.4 96 Blues 54.9 109 
Canucks 64.9 111 Blue Jackets 61.3 65 Predators Dye? 104 
Flames 64.2 90 Lightning 60.9 84 Jets 51.8 84 
Penguins 64.2 108 Oilers 60.5 74 Senators Bilet 92 
Kings 63.7 95 Blackhawks 60.2 101 Hurricanes 50.5 82 
Maple Leafs 63.5 80 Red Wings 59.3 102 Stars 49.9 89 
Bruins 62.2 102 Ducks 59.3 80 Avalanche 49.4 88 
Canadiens 62.2 78 Wild 55.8 81 Islanders 49.1 719 


NHL salaries. One can find online’ the amount each NHL team spent on players (in millions 
of dollars) for the 2011-12 NHL season and the total points each team earned by the end of the 
season. Questions 7.20 to 7.23 are based on the NHL data set. 


7.20 Figure 7.4 is a scatterplot of points earned against spending. How would you 
describe the overall pattern? 


7.21 


(a) Sharply curved 
b) Two distinct clusters that are widely separated 
c) A very weak association 

d) A strong positive association 
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The equation of the least-squares regression line for predicting points earned from 
spending is 


points = 66.5 + 0.43 X spending 


What does this tell us about points gained for each additional million dollars spent? 


(a) A team gains about 0.43 points per million dollars spent. 


( 
( 
( 


b) 
c) 


A team gains about 0.665 points per million dollars spent. 
A team gains about 66.5 points per million dollars spent. 
d) A team gains about 66.93 points per million dollars spent. 
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7.22 The equation of the least-squares regression line for predicting points earned from 
spending is 


points = 66.5 + 0.43 X spending 


Use the regression equation to predict the points earned if a team spent $50 million. 
(a) 21.5 (b) 66.5 (c) 88.0 (d) 3325.4 


7.23 The equation of the least-squares regression line for predicting points earned from 
spending is 


points = 66.5 + 0.43 X spending 


Use the regression equation to predict the points a team would earn if it spent no 
money. Which conclusion do you agree with? 
(a) The team will earn 66.5 points. 


(b) The prediction is not sensible, because the prediction is far outside the range of 
values of the response variable. 


(c) The prediction is not sensible, because no money is far outside the range of 
values of the explanatory variable. 


(d) The prediction is not sensible, because of the outlier present in the data. 


Measuring heights of the elderly. Because elderly people may have difficulty standing to 
have their heights measured, a study looked at predicting overall height from height to the knee. 
Here are data (in centimeters) for six elderly men: 


Knee height x 
Height y 


Shall 47.4 43.5 44.8 oom 54.6 
OZ 15333) 146.4 162.7 169.1 177.8 


Questions 7.24 to 7.26 are based on these data. 


7.24 Figure 7.5 is a scatterplot of height of a subject against knee height. Which of 


the following is a plausible value of the correlation, r, between height and knee 
height? 


(a) 0 (b) —0.25 (c) 0.90 (d) —0.90 
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Scatterplot of the heights of six elderly 
men against knee height, for Question 
7.24, 
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7.25 


7.26 


7.27 


7.28 


The equation of the least-squares regression line for predicting the height of a sub- 
ject from his knee height is 


height = 42.93 + 2.45 X knee height 


What does the slope of 2.45 tell us? 


(a) The average height of the elderly subjects in the study is 2.45 centimeters per 
centimeter of knee height. 

(b) The predicted height of the elderly subjects in the study is 2.45 centimeters per 
centimeter of knee height. 

(c) The predicted height of a subject with a knee height of 0 is 42.92 centimeters. 
(d) For each additional centimeter of knee height, the predicted height of a subject 
increases by 2.45 centimeters. 


The equation of the least-squares regression line for predicting the height of a sub- 
ject from their knee height is 


height = 42.93 + 2.45 X knee height 


Use this to predict the height of a person with a knee height of 50 cm. 
(a) 165.43 cm (b) 122.5 cm (c) 95.38 cm (d) 92.93 cm 


How well do people remember their past diet? Data are available for 91 people 
who were asked about their diet when they were 18 years old. Researchers asked 
them at about age 55 to describe their eating habits at age 18. For each subject, 
the researchers calculated the correlation between actual intakes of many foods at 
age 18 and the intakes the subjects now remember. The median of the 91 correla- 
tions was r = 0.217.° Which of the following conclusions is consistent with this 
correlation? 


(a) Subjects remember approximately 21.7% of their food intakes at age 18. 

(b) Subjects remember approximately r? = 0.217? = 0.047 of their food intakes at 
age 18. 

(c) Food intake at age 55 is about 21.7% of food intake at age 18. 

(d) Memory of food intake in the distant past is fair to poor. 


Joe’s retirement plan invests in stocks through an “index fund” that follows the 
behavior of the stock market as a whole, as measured by the Standard & Pootr’s 
(S&P) 500 stock index. Joe wants to buy a mutual fund that does not track the 
index closely. He reads that monthly returns from Fidelity Technology Fund have 
correlation r = 0.77 with the SGP 500 index, and that Fidelity Real Estate Fund 
has correlation r = 0.37 with the index. Which of the following is correct? 


(a) The Fidelity Technology Fund has a closer relationship to returns from the 
stock market as a whole and also has higher returns than the Fidelity Real Estate 
Fund. 

(b) The Fidelity Technology Fund has a closer relationship to returns from the 
stock market as a whole, but we cannot say that it has higher returns than the 
Fidelity Real Estate Fund. 

(c) The Fidelity Real Estate Fund has a closer relationship to returns from the stock 
market as a whole and also has higher returns than the Fidelity Technology Fund. 
(d) The Fidelity Real Estate Fund has a closer relationship to returns from the stock 
market as a whole, but we cannot say that it has higher returns than the Fidelity 
Technology Fund. 


Monkey calls. The usual way to study the brain’s response to sounds is to have subjects listen 
to “pure tones.” The response to recognizable sounds may differ. To compare responses, research- 
ers anesthetized macaque monkeys. They fed pure tones and also monkey calls directly to their 
brains by inserting electrodes. Response to the stimulus was measured by the firing rate (electrical 
spikes per second) of neurons in various areas of the brain. Table 7.1 contains the responses for 
37 neurons.” 
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NEURON TONE CALL NEURON TONE CALL NEURON TONE CALL 
1 474 500 14 145 42 26 il 134 
2 256 138 15 141 241 Di 68 65 
3 241 485 16 129 194 28 59 182 
4 226 338 17 113 123 29 59 97 
5 185 194 18 m2, 182 30 57 318 
6 174 159 19 102 141 31 56 201 
7 176 341 20 100 118 32 47 29 
8 168 85 21 74 62 33 46 62 
9 161 303 ap 72 iW 34 41 84 

10 150 208 a3) 20 193 35 26 203 
11 19 66 24 21 129 36 28 192 
12 20 4} 2S 26 11335) 37 31 70 
13 3p) 103 


Figure 7.6 is a scatterplot of monkey call response against pure-tone response (explanatory vari- 
able). Questions 7.29 and 7.30 refer to these data and the scatterplot. 


500 
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fe) FIGURE 7.6 
o 400 200 age aie a00 Scatterplot of monkey tone response 
Pure-tonhe response against pure-tone response, for 


Questions 7.29 and 7.30. 


7.29 We might expect some neurons to have strong responses to any stimulus and others 
to have consistently weak responses. There would then be a strong relationship 
between tone response and call response. From the scatterplot of monkey call 
response against pure-tone response in Figure 7.6, what would you estimate the 
correlation r to be? 


(a) —0.6 (b) —0.1 (c) 0.1 (d) 0.6 


DATA 


MONKEY 
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7.30 Which of the following statements about the scatterplot in Figure 7.6 is correct? 


(a) There is moderate evidence that pure-tone response causes monkey call 
response. 

(b) There is moderate evidence that monkey call response causes pure-tone 
response. 

(c) There are one or two outliers and at least one of these may also be influential. 
(d) None of the above. 


Catalog shopping. What is the most important reason that students buy from catalogs? The 
answer may differ for different groups of students. Here are counts for samples of American and 
East Asian students at a large midwestern university.!° Use these counts to answer Questions 7.31 
to 7.33, which are based on optional material. 


REASON AMERICAN ASIAN 
Save time 29 10 
Easy 28 tp 
Low price 17 34 
Live far from stores 11 4 
No pressure to buy 10 ) 
Other 20 d 
Total 015} 69 


7.31 What percent of all students say that the most important reason to buy from a 
catalog is to save time? 


(a) 74% (b) 25% (c) 21% (d) 14% 


7.32 What percent of East Asian students say that the most important reason to buy 
from a catalog is low price? 


(a) 67% (b) 49% (c) 28% (d) 18% 
7.33 What are the most important differences between the two groups of students? 


(a) The most important reasons for American students to buy from a catalog are to 
save time and because it is easy, while for East Asian students it is low price. 

(b) American students appear to be almost three times more likely to live far from 
stores than East Asian students. 

(c) East Asian students are twice as likely to purchase from a catalog because of low 
price than American students. 


(d) All of the above. 


Investment strategies. One reason to invest abroad is that markets in different countries don’t 
move in step. When American stocks go down, foreign stocks may go up. So an investor who holds 
both bears less risk. That’s the theory. But then we read in a magazine article that the correlation 
between changes in American and European stock prices rose from 0.4 in the mid-1990s to 0.8 in 
2000.'! Questions 7.34 and 7.35 refer to this article. 


7.34 Explain to an investor who knows no statistics why the fact stated in this article 
reduces the protection provided by buying European stocks. 


7.35 The same article claiming that the correlation between changes in stock prices 
in Europe and the United States is 0.8 goes on to say: “Crudely, that means that 
movements on Wall Street can explain 80% of price movements in Europe.” 

(a) Is this true? 
(b) What is the correct percent explained if r = 0.8? 


7.36 Researchers wished to determine whether individual differences in introspective 
ability are reflected in the anatomy of brain regions responsible for this function. 


They measured introspective ability (using a score on a test of introspective ability, 
with larger values indicating greater introspective ability) and gray-matter volume 
in milliliters (the Brodmann area) in the anterior prefrontal cortex of the brain of 
29 subjects. Here are the data: 


Introspective ability OMS; Ost} (OS) OLS) OE) Oeil Oo? Of) Oe} Oo 


Volume 


59 62 43 63 83 61 5 Si 57 67 


Introspective ability 0.63 0.64 0.65 0.65 0.65 0.65 0.65 0.66 0.66 0.67 


Volume 


ie 62 58 62 65 70 75 60 63 71 


Introspective ability 0.67 0.67 0.68 0.69 0.70 0.70 0.71 0.72 0.75 


Volume 


71 80 68 Ae, 66 73 61 80 75 


7.37 


The researchers wished to determine the equation of the least-squares regression 
line for predicting gray-matter volume ability (y) from introspective ability (x). To 
do this, they calculated the following summary statistics: 


x = 0.649, s, = 0.045 
y = 65.90, s, = 8.69 
r = 0.448 


(a) Use these summary statistics to calculate the equation of the least-squares 
regression line. 

(b) Based on the least-squares regression line, what would you predict gray-matter 
volume to be for someone with introspective ability 0.60? 

(c) Based on the least-squares regression line, what would you predict gray-matter 
volume to be for someone with introspective ability 0.99? How reliable do you 
think this prediction is? Explain your answer. 


Animals and people that take in more energy than they expend will get fatter. Here 
are data on 12 rhesus monkeys: 6 lean monkeys (4% to 9% body fat) and 6 obese 
monkeys (13% to 44% body fat). The data report the energy expended in 24 hours 
(kilojoules per minute) and the lean body mass (kilograms, leaving out fat) for each 
monkey.” 


LEAN OBESE 
MASS ENERGY | MASS ENERGY 
6.6 leit 7) Me9) 0.93 
7.8 1.02 9.4 1.39 
8.9 1.46 10.7 TLS) 
9.8 1.68 Wet 149. 
eel 1.06 eel 1.29 
5} 1.16 10.8 Sil 


(a) Compute the mean lean body mass of the lean monkeys. 

(b) Compute the mean lean body mass of the obese monkeys. 

(c) The goal of the study is to compare the energy expended in 24 hours by the lean 
monkeys with that of the obese monkeys. However, animals with higher lean mass 
usually expend more energy. Based on your calculations in parts (a) and (b), would 
it make sense to simply compute the mean energy expended by lean and obese 
monkeys and compare the means? Explain. 

(d) To investigate how energy expended is related to body mass, make a scatterplot 
of energy versus mass, using different plot symbols for lean and obese monkeys. 

(e) What do the trends in your scatterplot suggest about the monkeys? 
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SMOKERS 


7.38 


The number of adult Americans who smoke continues to drop. Here are estimates 
of the percents of adults (aged 18 and over) who were smokers in the years between 
1965 and 2011:¥ 


Year x 1965 1974 1979 1983 1987 1990 1993 
Smokersy 41.9 37.0 333 319 286 25.3 24.8 
Year x 1997 2000 2002 2006 2009 2011 
Smokersy 24.6 23.1 22.5 20.8 20.6 19.0 


(a) Make a scatterplot of these data. 

(b) Describe the direction, form, and strength of the relationship between percent 
of smokers and year. Are there any outliers? 

(c) Here are the means and standard deviations for both variables and the correla- 
tion between percent of smokers and year: 


x = 1992.0, s, = 14.0 
y = 27.2,s, = 7.0 
r = —0.98 


Use this information to find the least-squares regression line for predicting percent 
of smokers from year and add the line to your plot. 

(d) According to your regression line, how much did smoking decline per year dur- 
ing this period, on the average? 

(e) What percent of the observed variation in percent of adults who smoke can be 
explained by linear change over time? 

(f) One of the government’s national health objectives is to reduce smoking to no 
more than 12% of adults by 2020. Use your regression line to predict the percent of 
adults who will smoke in 2020. 

(g) Use your regression line to predict the percent of adults who will smoke in 2075. 
Why is your result impossible? Why was it foolish to use the regression line for this 
prediction? 


(Optional topic) People who get angry easily tend to have more heart disease. 
That’s the conclusion of a study that followed a random sample of 12,986 people 
from three locations for about four years. All subjects were free of heart disease at 
the beginning of the study. The subjects took the Spielberger Trait Anger Scale 
test, which measures how prone a person is to sudden anger. Here are data for the 
8474 people in the sample who had normal blood pressure. CHD stands for “coro- 
nary heart disease.” This includes people who had heart attacks and those who 
needed medical treatment for heart disease." 


LOW ANGER MODERATE ANGER HIGH ANGER | TOTAL 
CHD 53 110 oll 190 
No CHD 3057 4621 606 8284 
Total 3110 4731 633 8474 


(a) What percent of all 8474 people with normal blood pressure had CHD? 

(b) What percent of all 8474 people were classified as having high anger? 

(c) What percent of those classified as having high anger had CHD? 

(d) What percent of those with no CHD were classified as having moderate 
anger? 

(e) Do these data provide any evidence that as anger score increases, the percent 
who suffer CHD increases? Explain. 


Supplementary exercises apply the skills you have learned in ways 
that require more thought or more elaborate use of technology. Some 
of these exercises ask you to follow the Plan, Solve, and Conclude 
steps of the four-step process introduced on page 63. 


7.40 The Mississippi River. Table 7.2 gives the volume 
of water discharged by the Mississippi River into the 
Gulf of Mexico for each year from 1954 to 2001." The 
units are cubic kilometers of water—the Mississippi is 


a big river. 2) missip 


i 


AP Photo/The Gleaner, Mike Lawrence 


(a) Make a graph of the distribution of water volume. 
Describe the overall shape of the distribution and any 
outliers. 

(b) Based on the shape of the distribution, do you expect 
the mean to be close to the median, clearly less than the 
median, or clearly greater than the median? Why? Find 
the mean and the median to check your answer. 


Supplementary Exercises 
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(c) Based on the shape of the distribution, does it seem 
reasonable to use x and s to describe the center and 
variability of this distribution? Why? Find x and s if 
you think they are a good choice. Otherwise, find the 
five-number summary. 


7.41 More on the Mississippi River. The data in 
Table 7.2 are a time series. Make a time plot that 
shows how the volume of water in the Mississippi 
changed between 1954 and 2001. What does the time 
plot reveal that the histogram from Exercise 7.40 does 
not? It is a good idea to always make a time plot of time 
series data because a histogram cannot show changes 
over time. 


Falling through the ice. The Nenana Ice Classic is an annual 
contest to guess the exact time in the spring thaw when a tripod 
erected on the frozen Tanana River near Nenana, Alaska, will fall 
through the ice. The 2013 jackpot prize was $318,500. The contest 
has been run since 1917. Table 7.3 gives simplified data that record 


YEAR DISCHARGE | YEAR DISCHARGE |YEAR DISCHARGE 
1954 290 1970 540 1986 600 
1955 420 1971 480 1987 450 
1956 390 1972 600 1988 420 
1957 610 1973 880 1989 630 
1958 550 1974 710 1990 680 
1959 440 1975 670 1991 700 
1960 470 1976 420 1992 510 
1961 600 1977 430 1993 900 
1962 550 1978 560 1994 640 
1963 360 1979 800 1995 590 
1964 390 1980 500 1996 670 
1965 500 1981 420 1997 680 
1966 410 1982 640 1998 690 
1967 460 1983 770 1999 580 
1968 510 1984 710 2000 390 
1969 560 1985 680 2001 580 
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only the date on which the tripod fell each year. The earliest date so 
far is April 20. To make the data easier to use, the table gives the 
date each year in days starting with April 20. That is, April 20 is 
1, April 21 is 2, and so on. Exercises 7.42 to 7.44 concern these 


data.'® 


7.42 


7.43 


7.44 


7.45 


When does the ice break up? We have 97 years 
of data on the date of ice breakup on the Tanana 
River. Describe the distribution of the breakup date 
with both a graph or graphs and appropriate numerical 
summaries. What is the median date (month and day) 
for ice breakup? TANANA 


Global warming? Because of the high stakes, the 
falling of the tripod has been carefully observed for 
many years. If the date the tripod falls has been getting 
earlier, that may be evidence for the effects of global 
warming. 


(a) Make a time plot of the date the tripod falls against 
year. 

(b) There is a great deal of year-to-year variation. Fit- 
ting a regression line to the data may help us see the 
trend. Fit the least-squares line and add it to your time 
plot. What do you conclude? 

(c) There is much variation about the line. Give a 
numerical description of how much of the year-to-year 
variation in ice breakup time is accounted for by the 
time trend represented by the regression line. (This 
simple example is typical of more complex evidence for 
the effects of global warming: large year-to-year varia- 
tion requires many years of data to see a trend.) 


More on global warming. Side-by-side boxplots 
offer a different look at the data. Group the data into 
periods of roughly equal length: 1917 to 1940, 1941 to 
1964, 1965 to 1988, and 1989 to 2013. Make boxplots 
to compare ice breakup dates in these four time peri- 
ods. Write a brief description of what the plots show. 


Teachers’ salaries. The Organisation for Economic 
Co-operation and Development (OECD) began in 
1961 to stimulate economic progress and world trade. 
It originally consisted of European countries, the U.S., 
and Canada, but has now grown to include 34 coun- 
tries spanning the globe. The data file TEACHSL on 
the website gives the average starting salaries in 2012 
of primary public school teachers (PPP, US$) for the 


member nations.!? 


(a) Make a stemplot or a histogram to display the dis- 
tribution of teachers’ salaries in the OECD countries. 
(b) There is one high outlier. What country is this? 
What is the overall shape of the distribution if you 
ignore the outlier? 

(c) Based on your work in part (b), give a numerical 
summary of the center and variability of the distribution, 
omitting the outlier. 


7.46 


(d) Some Americans complain about teachers being 
overpaid. Where does the United States ($36,858) 


stand in this international comparison? 


Cicadas as 
fertilizer? Every 
17 years, swarms 
of cicadas emerge 
from the ground 
in the eastern 
United — States, 
live for about 
six weeks, then die. (There are several “broods, 
so we experience cicada eruptions more often than 
every 17 years.) There are so many cicadas that 
their dead bodies can serve as fertilizer and increase 
plant growth. In an experiment, a researcher added 
10 cicadas under some plants in a natural plot of 
American bellflowers in a forest, leaving other plants 
undisturbed. One of the response variables was the 
size of seeds produced by the plants. Here are data 
(seed mass in milligrams) for 39 cicada plants and 
33 undisturbed (control) plants:!® 


Alastair Shay; Papilio/Corbis 
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CICADA PLANTS CONTROL PLANTS 
0.237 0.277 0.241 0.142 |0.212 0.188 0.263 0.253 
OM OSEIE 0209 BO'23 8027/7, an| (0:2 010265 OSD e070 
ORZ 6102227 SOF O28 5a O82 OS O24 O22 57, Oh 55 
0.276 0.234 0.255 0.296 |0.215 0.285 0.198 0.266 
0.239 0.266 0.296 0.217 | 0.178 0.244 0.190 0.212 
0.238 0.210 0.295 0.193 |0.290 0.253 0.249 0.253 
O28 9 80:263) SOS055 0257, N0268— 10190 S10196 0.220 
0.351 0.245 0.226 0.276 | 0.246 0.145 0.247 0.140 
0.317 0.310 0.223 0.229 | 0.241 
O92 0: 201 O21 
Describe and compare the two distributions. Do the 
data support the idea that dead cicadas can serve as 
fertilizer? 
7.47 A big-toe problem. Hallux abducto valgus (call 


it HAV) is a deformation of the big toe that is not 
common in youth and often requires surgery. Doctors 
used X-rays to measure the angle (in degrees) of 
deformity in 38 consecutive patients under the age of 
21 who came to a medical center for surgery to correct 
HAV.” The angle is a measure of the seriousness of 
the deformity. The data appear in Table 7.4 as “HAV 
angle.” Describe the distribution of the angle of defor- 
mity among young patients needing surgery for this 
condition. {jj)) Biatoe 


=| 
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HAV MA HAV MA HAV MA 
ANGLE ANGLE ANGLE ANGLE ANGLE ANGLE 
28 18 2 15 16 10 
ae 16 7 16 30 12 
25 LD 16 10 30 10 
34 17 21 7 20 10 
38 33) Hs) 11 50 WE 
26 10 4: 15 25 25 
25 18 oy 12 26 30 
18 13 25 16 28 WY) 
30 19 21 16 Sil 24 
26 10 22 18 38 20 
28 17 20 10 32 37 
13 14 18 5) Dil 23) 

20 20 26 16 
7.48 Prey attract predators. Here is one way in which 


7.49 


nature regulates the size of animal populations: High 
population density attracts predators, who remove a 
higher proportion of the population than when the 
density of the prey is low. One study looked at kelp 
perch and their common predator, the kelp bass. The 
researcher set up four large circular pens on sandy 
ocean bottom in Southern California. He chose young 
perch at random from a large group and placed 10, 20, 
40, and 60 perch in the four pens. Then he dropped 
the nets protecting the pens, allowing bass to swarm 
in, and counted the perch left after 2 hours. Here are 
data on the proportions of perch eaten in four repeti- 
tions of this setup:7° PREY 


PERCH PROPORTION KILLED 
10 0.0 0.1 OSmLOS: 
20 0.2 0.3 0.3 0.6 
40 0.075 0.3 0.6 0.725 
60 O70 Sa Os Or 1 


Do the data support the principle that “more prey attract 
more predators, who drive down the number of prey”? 


Predicting foot problems. Metatarsus adductus 
(call it MA) is a turning in of the front part of the 
foot that is common in adolescents and usually cor- 
rects itself. Table 7.4 gives the severity of MA (“MA 
angle”). Doctors speculate that the severity of MA 
can help predict the severity of HAV. Describe the 
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relationship between MA and HAV. Do you think the 
data confirm the doctors’ speculation? Why or why 
not? £qy,| BIGTOE 


=| 
it 


Change in the Serengeti. Long-term records from 
the Serengeti National Park in Tanzania show inter- 
esting ecological relationships. When wildebeest are 
more abundant, they graze the grass more heavily, so 
there are fewer fires and more trees grow. Lions feed 
more successfully when there are more trees, so the 
lion population increases. Here are data on one part 
of this cycle, wildebeest abundance (in thousands of 
animals) and the percent of the grass area that burned 
in the same year:”' Sy) sERENG 


Gallo Images-Anthony Bannister/ 


: 
g 
WILDEBEEST PERCENT WILDEBEEST PERCENT 
(1000s) BURNED (1000s) BURNED 
396 56 622 60 
476 50 600 56 
698 25 902 45 
1049 16 1440 21 
1178 7 esa 32 
1200 5 7) 31 
1302 u 1178 24 
360 88 1253 24 
444 88 1249 53 
524 75 


To what extent do these data support the claim that 
more wildebeest reduce the percent of grasslands that 
burn? How rapidly does burned area decrease as the 
number of wildebeest increases? Include a graph and 
suitable calculations. 


Casting aluminum. In casting metal parts, molten 
metal flows through a “gate” into a die that shapes the 
part. The gate velocity (the speed at which metal is 
forced through the gate) plays a critical role in die cast- 
ing. A firm that casts cylindrical aluminum pistons exam- 
ined 12 types formed from the same alloy. How does the 
piston wall thickness (inches) influence the gate velocity 
(feet per second) chosen by the skilled workers who do 
the casting? If there is a clear pattern, it can be used to 
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direct new workers or to automate the process. Analyze 
these data and report your findings.” £j),) aLum 


=| 
= 


THICKNESS VELOCITY | THICKNESS VELOCITY 
0.248 123.8 0.628 326.2 
0.359 ESS) 0.697 302.4 
0.366 180.9 0.697 145.2 
0.400 104.8 0.752 263.1 
0.524 228.6 0.806 302.4 
0.552 223.8 0.821 302.4 


Texting and driving. The Ohio Youth Risk Behav- 
ior Survey is conducted every two years in a sample 
of high schools across the state of Ohio. One of the 
questions on the 2013 survey was, “During the past 
30 days, on how many did you text or email while 
driving a car or other vehicle?” Here are the survey 


results:23 el) TEXTING 


DAYS FEMALE MALE 
0 263 229 
1-9 96 100 
10-19 19 27 
20-30 53 53 
Total 431 409 


Write a brief analysis of these results that focuses on 
the relationship between sex and driving behavior. 


7.53 Influence: hot sector funds? Investment adver- 
tisements always warn that “past performance does 
not guarantee future results.” Here is an example that 
shows why you should pay attention to this warning. 
Stocks fell sharply in 2002, then rose sharply in 2003. 
The table below gives the percent returns from 23 
Fidelity Investments “sector funds” in these two years. 
Sector funds invest in narrow segments of the stock 
market. They often rise and fall faster than the market 
as a whole. £jj,) MruNDs 

2002 2003 2002 2003 2002 2003 
RETURN RETURN | RETURN’ RETURN | RETURN’ RETURN 
7 23.9 —0.7 36.9 —37.8 59.4 
—6.7 14.1 —5.6 2S) =jULS DES 
= ile 41.8 —26.9 26.1 0) 36.9 
se) 43.9 —42.0 62.7 64.3 Syl 
—18.9 Silall —47.8 68.1 —9.6 28.7 
Fe S39 ear 
Sy 36.5 —49.5 57.0 5) 19.1 

—11.4 30.6 —23.4 35.0 
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(a) Make a scatterplot of 2003 return (response) against 
2002 return (explanatory). The funds with the best per- 
formance in 2002 tend to have the worst performance 
in 2003. Fidelity Gold Fund, the only fund with a posi- 
tive return in both years, is an extreme outlier. 

(b) To demonstrate that correlation is not resistant, 
find r for all 23 funds and then find r for the 22 funds 
other than Gold. Explain from Gold’s position in your 
plot why omitting this point makes r more negative. 
(c) Find the equations of two least-squares lines for 
predicting 2003 return from 2002 return, one for all 
23 funds and one omitting Fidelity Gold Fund. Add 
both lines to your scatterplot. Starting with the least- 
squares idea, explain why adding Fidelity Gold Fund to 
the other 22 funds moves the line in the direction that 
your graph shows. 


Influence: monkey calls. Table 7.1 contains data 

on the response of 37 monkey neurons to pure tones 

and to monkey calls. Figure 7.6 is a scatterplot of these 
Ij)) MONKEY 


(a) Find the least-squares line for predicting a neuron’s 
call response from its pure tone response. Add the line 
to your scatterplot. Mark on your plot the point (call 
it A) with the largest residual (either positive or nega- 
tive) and also the point (call it B) that is an outlier in 
the x direction. 

(b) How influential are each of these points for the 
correlation 1? 

(c) How influential are each of these points for the 
regression line? 


Influence: bushmeat. Table 7.5 gives data on fish 
catches in a region of West Africa and the percent 
change in the biomass (total weight) of 41 animals 
in nature reserves. It appears that years with smaller 
fish catches see greater declines in animals, probably 
because local people turn to “bushmeat” when other 
sources of protein are not available. The next year 
(1999) had a fish catch of 23.0 kilograms per person 
and animal biomass change of —22.9%. {j))) BUSHMT 


(a) Make a scatterplot that shows how change in ani- 
mal biomass depends on fish catch. Be sure to include 
the additional data point. Describe the overall pattern. 
The added point is a low outlier in the y direction. 

(b) Find the correlation between fish catch and change 
in animal biomass both with and without the outlier. 
The outlier is influential for correlation. Explain from 
your plot why adding the outlier makes the correlation 
smaller. 

(c) Find the least-squares line for predicting change in 
animal biomass from fish catch both with and without 
the additional data point for 1999. Add both lines to 
your scatterplot from part (a). The outlier is not influ- 
ential for the least-squares line. Explain from your plot 
why this is true. 
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FISH SUPPLY BIOMASS CHANGE 


FISH SUPPLY BIOMASS CHANGE 


YEAR (KILOGRAMS PER PERSON) (PERCENT) YEAR (KILOGRAMS PER PERSON) (PERCENT) 
1971 34.7 2.9 1985 Pile) 50) 
1972 39.3 Zell 1986 24.3 =0)7/ 
1973 32.4 =I 1987 27.4 =, 
1974 31.8 =i 1988 24.5 Sel 
1975 32.8 =15}55) 1989 MB \ep! S42 
1976 38.4 dell 1990 25.9 0.9 
1977 Boe. 1.9 1991 23:0) = Ooll 
1978 29.7 =0)53) 1992 Biel —4.1 
1979 25.0) —5.9 1993 23.4 —4.8 
1980 21.8 =) 1994 18.9 =i. 
1981 20.8 = 55) 1995 19.6 hs} 
1982 19.7 =, 1996 25.3 = 10,7 
1983 20.8 —4.1 1997 EAO) = Ile} 
1984 JiAlesl —8.6 1998 ZARO —7.4 
7.56 Python eggs. How is the hatching of water python eggs hatched, along with 38 of the neutral eggs and 16 


eggs influenced by the temperature of the snake’s nest? 
Researchers placed 104 newly laid eggs in a hot envi- 
ronment, 56 in a neutral environment, and 27 ina cold 
environment. Hot duplicates the warmth provided by 
the mother python. Neutral and cold are cooler, as 
when the mother is absent. The results: 75 of the hot 


of the cold eggs.” 


(a) Make a two-way table of “environment tempera- 
ture” against “hatched or not.” 

(b) The researchers anticipated that eggs would hatch 
less well at cooler temperatures. Do the data support 
that anticipation? 
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Online Data for Additional Analyses 


SAT; ACT, and teacher salaries for 2013 for each of the 50 states and the District of 
Columbia are available in the data set SATACT: One could use these data to carry 
out analyses for ACT scores similar to those for the SAT scores in Chapters 5 and 6. 
For example, repeat the analyses in Exercises 5.50 (page 157) and 5.51 (page 158) 
using ACT scores instead of SAT scores. elu) SATACT 


The data set MLB contains hitting, pitching, fielding, salary, and win—loss perfor- 
mance data from the 2013 season for all major league baseball teams. These data 
can be used to determine the correlation between payroll and winning percent- 
age. One can also explore what variables are most highly correlated with winning 
percentage, and whether variables that measure pitching performance are more 
highly correlated with winning percentage than variables that measure hitting 
performance. For example, calculate the correlation between winning percentage 
and number of home runs, between winning percentage and batting average, be- 
tween winning percentage and ERA, between winning percentage and strikeouts 
by pitchers, and between winning percentage and payroll. Which has the highest 
correlation? These data are from http://www.baseball-reference. 
com/. Visit this website for definitions of several of the variables in the data 
set. yy.) MLB 


Historical temperature data and whether Punxsutawney Phil saw his shadow are 
available in the data set PHIL. Repeat the analysis in Exercise 6.32 (page 177), 
but define what constitutes “six more weeks of winter-like weather” differently. 
For example, you might decide there were six more weeks of winter-like weather if 
average temperatures for March were at least one degree below historical averages. 


Data from the Ohio Department of Health website are available in the pdf 
“20130OHH Detail Tables.” This is a source of many tables that can be used for 
further analyses using methods discussed in Chapter 6. For example, conduct an 
analysis like that in Exercise 7.52 to investigate the relationship between sex and 
strategies about weight (Question 67 in the Tables). Eli) HEALTH 


The data set WHAT contains three variables and 3848 observations on each. At 
one time, this was considered a large data set and difficult to explore with software. 
Use various exploratory methods available in software packages such as JMP and 
Minitab to find the “hidden pattern” in these data. WHAT 
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Producing Data 


he purpose of statistics is to gain understanding from data. We can 

seek understanding in different ways, depending on the circumstances. 

We have studied one approach to data, exploratory data analysis, in 

some detail. Now we begin the move from data analysis toward statis- 
tical inference. Both types of reasoning are essential to effective work with data. 
Here is a brief sketch of the differences between them: 


EXPLORATORY DATA ANALYSIS STATISTICAL INFERENCE 


The purpose is unrestricted The purpose is to answer specific 
exploration of the data, searching questions, posed before the data 
for interesting patterns. were produced. 

Conclusions apply only to the Conclusions apply to a larger group 
individuals and circumstances for of individuals or a broader class of 
which we have data in hand. circumstances. 


Conclusions are informal, based on Conclusions are formal, backed by a 


what we see in the data. statement of our confidence in them. 


Our journey toward inference begins in Chapters 8-10, which describe 
statistical designs for producing data by samples and experiments, and the 
ethical issues involved. The important lesson of Part Il is that the quality of the 
inferences we make from data depends heavily on how the data are produced. 


PRODUCING DATA 


CHAPTER 8 Producing 
Data: Sampling 


CHAPTER 9 Producing 
Data: Experiments 


CHAPTER 10 Data 
Ethics* 


CHAPTER 11 Producing 
Data: Part Il Review 
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8.1 Population versus sample 

8.2 How to sample badly 

8.3 Simple random samples 

8.4 Inference about the population 
8.5 Other sampling designs 


8.6 Cautions about sample 
tatistics, the science of data, provides ideas and tools that we can use in many surveys 


settings. Sometimes we have data that describe a group of individuals, and 
want to learn what the data say. That’s the job of exploratory data analysis. 
Sometimes we have specific questions, but no data to answer them. To get sound 
answers, we must produce data in a way that is designed to answer our questions. 
Suppose our question is, “What percent of college students think that people 
should not obey laws that violate their personal values?” To answer the question, 
we interview undergraduate college students. We can’t afford to ask all students, 
so we put the question to a sample chosen to represent the entire student population. 
How shall we choose a sample that truly represents the opinions of the entire 
population? Statistical designs for choosing samples are the topic of this chapter. 
We will see that 


8.7 The impact of technology 


A sound statistical design is necessary if we are to trust data from a sample 
for drawing sound conclusions about the population. 


In sampling from large human populations, however, “practical problems” 
can overwhelm even sound designs. 


The impact of technology (particularly cell phones and the Internet) is 
making it harder to produce trustworthy national data by sampling. 
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sample survey 


A political scientist wants to know what percent of college-age adults consider 
themselves conservatives. An automaker hires a market research firm to learn what 
percent of adults aged 18 to 35 recall seeing television advertisements for a new 
gas—electric hybrid car. Government economists inquire about average household 
income. In all these cases, we want to gather information about a large group of 
individuals. Time, cost, and inconvenience preclude contacting every individual. 
So we gather information about only part of the group to draw conclusions about 
the whole. 


Population, Sample, Sampling Design 


The population in a statistical study is the entire group of individuals about which 
we want information. 


A sample is a part of the population from which we actually collect information. We 
use a sample to draw conclusions about the entire population. 


A sampling design describes exactly how to choose a sample from the population. 


Pay careful attention to the details of the definitions of “population” and “sample.” 
Read and answer Exercise 8.1 right now to check your understanding. 

We often draw conclusions about a whole on the basis of a sample. Everyone has 
tasted a sample of ice cream and ordered a cone on the basis of that taste. But ice 
cream is uniform, so that the single taste represents the whole. Choosing a repre- 
sentative sample from a large and varied population is not so easy. The first step in 
planning a sample survey is to say exactly what population we want to describe. The 
second step is to say exactly what we want to measure, that is, to give exact defini- 
tions of our variables. These preliminary steps can be complicated, as the following 
example illustrates. 


The most important government sample survey in the United States is the monthly 
Current Population Survey (CPS) conducted by the Bureau of the Census for the 
Bureau of Labor Statistics. The CPS contacts about 60,000 households each month. 
It produces the monthly unemployment rate and much other economic and social 
information. (See Figure 8.1.) To measure unemployment, we must first specify the 
population we want to describe. Which age groups will we include? Will we include 
illegal immigrants or people in prisons? The CPS defines its population as all U.S. 
residents (legal or not) 16 years of age and over who are civilians and are not in an 
institution such as a prison. The unemployment rate announced in the news refers to 
this specific population. 

The second question is harder: what does it mean to be “unemployed”? Someone 
who is not looking for work—for example, a full-time student—should not be called 
unemployed just because she is not working for pay. If you are chosen for the CPS 
sample, the interviewer first asks whether you are available to work and whether you 
actually looked for work in the past four weeks. If not, you are neither employed nor 
unemployed—you are not in the labor force. So discouraged workers who haven't 
looked for a job in four weeks are excluded from the count. 

If you are in the labor force, the interviewer goes on to ask about employment. 
If you did any work for pay or in your own business during the week of the survey, 
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UNITED STATES DEPARTMENT OF LABOR RteTind: | Ce | Aint RS | tt FIGURE 8.1 


The home page of the Current 
Population Survey at the Bureau 
of Labor Statistics. 


* BUREAU OF LABOR STATISTICS 


Subject Areas + Databases & Tools ~ Publications + Economic Releases + 


Job openings rate increased to 2.3% in February from 2.1% in January 

There were 3.1 million job openings on the last business day of February, an increase from 2.7 
milion in January. The hires rate (3.0 percent) and separations rate (2.9 percent) were itte 
changed over the month. 

HTM, | POE 


March import prices advance 2.7%; export prices rise 1.5% 

U.S. import prices rose 2.7 percent in March, following a 1.4 percent advance in February. The 
March increase was driven by both higher fuel and nonfuel prices. The price index for U.S. 
exports increased 1.5 percent in March after rising 1.4 percent the previous month. 
HIM. | PDE 


A Profile of the Working Poor, 2009 


In October 2010, 68.1% of 2010 high schoo! graduates wore enrolled in college In 2009, about 43.6 million people, or 14.3 
Metropolitan Area Employment and Unemployment (Monthly) percent of the Nation's population, lived at 
Payroll employment increases 216,000 in March; unemployment rate (8.8 percent) changes... or below the official poverty level. 
Multifactor productivity rose modestly at 0.1% in 2009; output, combined inputs decline read mere » 
Eikhart, Ind., employment grows 6.8% in September 2010; nationally, jobs increase 0.2% 

read more » 


REGIONAL PRODUCTS » < m1 2)|3//4)/5 > 

Archives » 

LATEST NUMBERS 

Consumer Price index (CPI): 

40.5% in Feb 2011 ~e 

Unemployment Rate: 

8.0% in Mar 2011 ~= 

Payroll Employment: 

+216,000(p) in Mar 2011 ae 

Click an the map fo select region percent from De es 
unchanged in Mar 2011 a =F 


co) 
Producer Price index (PPI): 
+1.6%(p) in Feb 2011 a= 


Employment Cost Index (EC!): 


US. Bureau of Labor Statistics 


New England: 

‘) +0.4% in 4th Qtr of 2010 = 
Productivity: 
*2.5% in 4th Qtr of 2010 a= 
U.S. import Price Index: 
+2.7% in Mar 2011 aE 
U.S. Export Price index: 
1.5% in Mar 2011 ~= 


you are employed. If you worked at least 15 hours in a family business without pay, 
you are employed. You are also employed if you have a job but didn’t work because 
of vacation, being on strike, or other good reason. An unemployment rate of 6.7% 
means that 6.7% of the sample was unemployed, using the exact CPS definitions of 
both “labor force” and “unemployed.” m 


The final step in planning a sample survey is the sampling design. We will now 
introduce basic statistical designs for sampling. 


(9) LaunchPad Online Resources 


* The Snapshots video, Sampling, provides a good overview of why we need 
to sample and the distinction between populations and samples. 


Apply Your Knowledge 


8.1. Sampling Students. A political scientist wants to know how college stu- 
dents feel about the Social Security system. She obtains a list of the 3456 
undergraduates at her college and mails a questionnaire to 250 students se- 
lected at random. Only 104 questionnaires are returned. 


(a) What is the population in this study? Be careful: about what group 
does she want information? 

(b) What is the sample? Be careful: from what group does she actually 
obtain information? 

The important message in this problem is that the sample can redefine the 

population about which information is obtained. 
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convenience sample 


8.2 Student Archaeologists. An archaeological dig turns up large numbers of 
pottery shards, broken stone implements, and other artifacts. Students work- 
ing on the project classify each artifact and assign it a number. The counts 
in different categories are important for understanding the site, so the proj- 
ect director chooses 2% of the artifacts at random and checks the students’ 
work. What are the population and the sample here? 


8.3 Software Survey. A statistical software company is planning on updating 
Version 8.1 of its software and wants to know what features are most impor- 
tant to users. The company’s managers have the email addresses of 1100 
individuals, mostly faculty at universities, for whom they have supplied free 
courtesy copies of Version 8.1. They email these 1100 individuals and ask 
them to complete a survey online. A total of 186 of these individuals com- 
plete the survey. 


(a) What is the population of interest to the software company? Do you think 
the 1100 individuals contacted are representative of the population? 
Explain your reasons. 


(b) What is the sample? From what group is information actually obtained? 


How can we choose a sample that we can trust to represent the population? A 
sampling design is a specific method for choosing a sample from the population. 
The easiest—but not the best—design just chooses individuals close at hand. If 
we are interested in finding out how many people have jobs, for example, we might 
go to a shopping mall and ask people passing by if they are employed. A sample 
selected by taking the members of the population that are easiest to reach is called 
a convenience sample. Convenience samples often produce unrepresentative data. 


A sample of mall shoppers is fast and cheap. But people at shopping malls tend to 
be more prosperous than typical Americans. They are also more likely to be teenagers 
or retired. Moreover, unless interviewers are carefully trained, they tend to question 
well-dressed, respectable-looking people and avoid poorly dressed or tough-looking 
individuals. The types of people at the mall will also vary by time of day and day of 
week. In short, mall interviews will not contact a sample that is representative of the 
entire population. m 


Interviews at shopping malls will almost surely overrepresent middle-class and 
retired people and underrepresent the poor. This will happen almost every time we 
take such a sample. That is, it is a systematic error caused by a bad sampling design, 
not just bad luck on one sample. This is bias: the outcomes of mall surveys will 
repeatedly miss the truth about the population in the same ways. 


The design of a statistical study is biased if it systematically favors certain outcomes. 
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Former CNN evening commentator Lou Dobbs doesn't like illegal immigration. One of 
his broadcasts in 2007 was largely devoted to attacking a proposal by the governor 
of New York State to offer driver's licenses to illegal immigrants as a public safety 
measure. During the show, Mr. Dobbs invited his viewers to go to loudobbs . com to 
vote on the question, “Would you be more or less likely to vote for a presidential can- 
didate who supports giving drivers’ licenses to illegal aliens?” We aren’t surprised that 
97% of the 7350 people who voted by the end of the broadcast said, “Less likely.” = 


The loudobbs.com poll was biased because people chose whether or not to 
participate (see Exercise 8.33). Most who voted were viewers of Lou Dobbs’s pro- 
gram who had just heard him denounce the governor's idea. People who take the 

trouble to respond to an open invitation are usually not representative of any 
A clearly defined population. That’s true of the people who bother to respond 

to write-in, call-in, or online polls in general. Polls like these are examples 
of voluntary response sampling. 


Voluntary Response Sample 


A voluntary response sample consists of people who choose themselves by re- 
sponding to a broad appeal. Voluntary response samples are biased because people 
with strong opinions are most likely to respond. 


Apply Your Knowledge 


8.4 Sampling on Campus. You see a student standing in front of the student 
center, now and then stopping other students to ask them questions. She says 
that she is collecting student opinions for a class assignment. Explain why 
this sampling method is almost certainly biased. 


8.5 More Sampling on Campus. You would like to start a club for psychology 
majors on campus, and you are interested in finding out what proportion of 
psychology majors would join. The dues would be $35 and used to pay for 
speakers to come to campus. You ask five psychology majors from your senior 
psychology honors seminar whether they would be interested in joining this 
club and find that four of the five students questioned are interested. Is this 
sampling method biased, and if so, what is the likely direction of bias? 


In a voluntary response sample, people choose whether to respond. In a conve- 
nience sample, the interviewer makes the choice. In both cases, personal choice 
produces bias. The statistician’s remedy is to allow impersonal chance to choose the 
sample. A sample chosen by chance rules out both favoritism by the sampler and 
self-selection by respondents. Choosing a sample by chance attacks bias by giving 
all individuals an equal chance to be chosen. Rich and poor, young and old, liberal 
and conservative, all have the same chance to be in the sample. 

The simplest way to use chance to select a sample is to place names in a hat (the 
population) and draw out a handful (the sample). This is the idea of simple random 
sampling. Although the idea of drawing names from a hat is a good way to concep- 
tualize a simple random sample, it is generally not a good method for obtaining a 
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TQ are THESE 
wih RANDOM DiciTS 
REALLY RANDOM? 


Not a chance. The random digits in 
Table B were produced by a computer 
program. Computer programs do 
exactly what you tell them to do. 
Give the program the same input, 
and it will produce exactly the same 
“random” digits. Of course, clever 
people have devised computer 
programs that produce output that 
looks like random digits. These are 
called “pseudo-random numbers,” 
and that’s what Table B contains. 
Pseudo-random numbers work fine 
for statistical randomizing, but they 
have hidden nonrandom patterns 
that can mess up more refined uses. 


simple random sample. Writing names on slips of paper can lead to bias if the slips 
of paper are not well mixed or there is a tendency to select them from, say, the top 
or bottom. Drawing names from a hat would be particularly difficult to implement 
if the population size is large, possibly requiring thousands of slips of paper. 


Simple Random Sample 


Asimple random sample (SRS) of size n consists of n individuals from the popula- 
tion chosen in such a way that every set of n individuals has an equal chance to be 
the sample actually selected. 


An SRS not only gives each individual an equal chance to be chosen, but also 
gives every possible sample an equal chance to be chosen. There are other random 
sampling designs that give each individual, but not each sample, an equal chance. 
Exercise 8.43 describes one such design. 

When you think of an SRS, you can still picture the conceptual situation of draw- 
ing names from a hat to remind yourself that an SRS doesn’t favor any part of the pop- 
ulation. That’s why an SRS is a better method of choosing samples than convenience 
or voluntary response sampling. But writing names on slips of paper, mixing them 
well, and drawing them from a hat is slow and inconvenient. That’s especially true if, 
like the Current Population Survey, we must draw a sample of size 60,000. In practice, 
samplers use software. The Simple Random Sample applet makes choosing an SRS very 
fast. If you don’t use the applet or other software, you can randomize by using a table 
of random digits. In fact, software for choosing samples starts by generating random 
digits, so using a table just does by hand what the software does more quickly. 


Random Digits 


Atable of random digits is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 

with these two properties: 

1. Each entry in the table is equally likely to be any of the 10 digits 0 through 9. 

2. The entries are independent of each other. That is, knowledge of one part of the 
table gives no information about any other part. 


Table B at the back of the book is a table of random digits. Table B begins with 
the digits 19223950340575628713. To make the table easier to read, the digits 
appear in groups of five and in numbered rows. The groups and rows have no 
meaning—the table is just a long list of randomly chosen digits. There are two steps 
in using the table to choose a simple random sample. 


Using Table B to Choose an SRS 


Label: Give each member of the population a numerical label of the same length. 


Table: To choose an SRS, read from Table B successive groups of digits of the length 
you used as labels. Your sample contains the individuals whose labels you find in the 


table. 


You can label up to 100 items with two digits: 01, 02, ..., 99, 00. Up to 1000 items 
can be labeled with three digits, and so on. Always use the shortest labels that will cover 
your population. As standard practice, we recommend that you begin with label 1 (or 
Ol or O01, as needed). Reading groups of digits from the table gives all individuals the 
same chance to be chosen because all labels of the same length have the same chance 
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to be found in the table. For example, any pair of digits in the table is equally likely to 
be any of the 100 possible labels 01, 02, ... , 99, 00. Ignore any group of digits that was 
not used as a label or that duplicates a label already in the sample. You can read digits 
from Table B in any order—across a row, down a column, and so on—because the 
table has no order. As standard practice, we recommend reading across rows. 


A campus newspaper plans a major article on spring break destinations. The report- 
ers intend to call four randomly chosen resorts at each destination to ask about their 
attitudes toward groups of students as guests. Here are the resorts listed in one city: 


O01 Aloha Kai 08 Captiva 15 Palm Tree 22 Sea Shell 

O02 Anchor Down O09 Casadel Mar 16 Radisson 23 Silver Beach 
03 Banana Bay 10 Coconuts 17 Ramada 24 Sunset Beach 
04 Banyan Tree 11. Diplomat 18 Sandpiper 25 Tradewinds 

O5 Beach Castle 12 Holiday Inn 19 Sea Castle 26 Tropical Breeze 
O6 Best Western 13 Lime Tree 20 Sea Club 27 Tropical Shores 
O7 Cabana 14 Outrigger 21 SeaGrape 28 Veranda 


Label: Because two digits are needed to label the 28 resorts, all labels will have 
two digits. We have added labels 01 to 28 in the list of resorts. Always say how you 
labeled the members of the population. To sample from the 1240 resorts in a major 
vacation area, you would label the resorts 0001, O0002,..., 1239, 1240. 


Select sample: To use the Simple Random Sample applet, just enter 28 in the 
“Population =” box and 4 in the “Select a sample of size:” box, and click “Sample.” 
Figure 8.2 shows the result of one sample which contains the resorts labeled 20, 16, 
21, and 10. These are Sea Club, Radisson, Sea Grape, and Coconuts. 


Population = 1to {28 | Sev be 


Select asample ofsize: [4 


To use Table B, read two-digit groups until you have chosen four resorts. Starting 
at line 130 (any line will do), we find 


69051 64817 87174 09517 84534 06489 87201 97245 


Because the labels are two digits long, read successive two-digit groups from the 
table: the first three two-digit groups here are 69, 05, and 16. Ignore groups not 
used as labels, like the initial 69. Also ignore any repeated labels, like the second and 
third 17s in this row, because you can’t choose the same resort twice. Your sample 
contains the resorts labeled O05, 16, 17, and 20. These are Beach Castle, Radisson, 
Ramada, and Sea Club. m 


We can trust results from an SRS, as well as from other types of random samples 

that we will meet later, because the use of impersonal chance avoids bias. Online 

polls and mall interviews also produce samples. We can’t trust results from 

A these samples, because they are chosen in ways that invite bias. The first 
question to ask about any sample is whether it was chosen at random. 


FIGURE 8.2 

The Simple Random Sample applet 
used to choose an SRS of size n = 4 
from a population of size 28. 


Robert Daly/Getty Images 
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“In general, do you think gun control laws should be made more strict, less strict, or 
kept as they are now?” When the New York Times and CBS News asked this question 
of 1644 adults in February 2014, 54% said “More,” 9% said “Less,” and 36% said 
“Kept as they are now.” Can we trust the opinions of this sample to fairly represent 
the opinions of all adults? Here’s part of the statement by the Times on how the poll 
was conducted: 


The latest New York Times/CBS News poll is based on telephone interviews conducted 
February 19 through February 23 with 1644 adults throughout the United States. 

The sample of land line telephone exchanges called was randomly selected by a 
computer from a complete list of more than 81,000 active residential exchanges across 
the country. The exchanges were chosen so as to ensure that each region of the country 
was represented in its proper proportion. 

Within each exchange, random digits were added to form a complete telephone number, 
thus permitting access to listed and unlisted numbers alike. Within each household, one 
adult was designated by a random procedure to be the respondent for the survey. 

Cellphone numbers were generated by a similar random process. The two samples 
were then combined and adjusted to assure the proper ration of landine-only, cellphone- 
only, and dual phone users." 


The selection of landline phone numbers is a good description of a common method 

random digit dialing for choosing national samples, called random digit dialing. Although the informa- 
tion about the conduct of the poll provides the reader with some basic details, data 
collection and analysis of a national survey is a great deal more complex than this 
short description suggests. We'll come back to random digit dialing and its problems 
later (see Exercise 8.16), but this statement on the conduct of the poll does contain 
important information. We know the size of the sample, when the poll was taken, and 
that the comforting word “random” appears four times. m 


() LaunchPad Online Resources 


* The StatClips video, Data and Sampling, discusses the importance of simple 
random samples, and the StatClips Examples video, Data and Sampling: 
Example C, provides an illustration of using a table of random numbers to 
select an SRS. 


Apply Your Knowledge 


8.6 Apartment Living. You are planning a report on apartment living in a college 
town. You decide to select three apartment complexes at random for in-depth 
interviews with residents. Use the Simple Random Sample applet, other software, 
or Table B to select a simple random sample of four of the following apartment 
complexes. If you use Table B, start at line 128. 


Ashley Oaks Country View Mayfair Village 
Bay Pointe Country Villa Nobb Hill 

Beau Jardin Crestview Pemberly Courts 
Bluffs Del-Lynn Peppermill 
Brandon Place Fairington Pheasant Run 
Briarwood Fairway Knolls River Walk 
Brownstone Fowler Sagamore Ridge 
Burberry Place Franklin Park Salem Courthouse 
Cambridge Georgetown Village Square 


Chauncey Village Greenacres Waterford Court 


8.4 Inference About the Population 


8.7. Minority Managers. A firm wants to understand the attitudes of its minor- 
ity managers toward its system for assessing management performance. Fol- 
lowing is a list of all the firm’s managers who are members of minority groups. 
Use the Simple Random Sample applet, other software, or Table B at line 141 
to choose five managers to be interviewed in detail about the performance 
appraisal system. 


Adelaja Draguljic Huo Modur 
Ahmadiani Fernandez Ippolito Rettiganti 
Barnes Fox Jiang Rodriguez 
Bonds Gao Jung Sanchez 
Burke Gemayel Mani Sgambellone 
Deis Gupta Mazzeo Yajima 

Ding Hernandez 


8.8 Sampling Gravestones. The local genealogical society in Coles County, 
Illinois, has compiled records on all 55,914 gravestones in cemeteries in the 
county for the years 1825 to 1985. Historians plan to use these records to learn 
about African Americans in Coles County’s history. They first choose an SRS 
of 395 records to check their accuracy by visiting the actual gravestones.’ 


(a) How would you label the 55,914 records? 
(b) Use software or Table B, starting at line 137, to choose the first five 


records for the SRS. oe 


The purpose of a sample is to give us information about a larger population. The 
process of drawing conclusions about a population on the basis of sample data is 
called inference because we infer information about the population from what we 
know about the sample. 

Inference from convenience samples or voluntary response samples would be 
misleading because these methods of choosing a sample are biased. We are almost 
certain that the sample does not fairly represent the population. The first reason to 
rely on random sampling is to eliminate bias in selecting samples from the list of available 
individuals. 

Nonetheless, it is unlikely that results from a random sample are exactly the same 
as for the entire population. Sample results, like the unemployment rate obtained 
from the monthly Current Population Survey, are only estimates of the truth about 
the population. If we select two samples at random from the same population, we 
will almost certainly draw different individuals. So the sample results will differ 
somewhat, just by chance. Properly designed samples avoid systematic bias, but their 
results are rarely exactly correct, and they vary from sample to sample. 

Why can we trust random samples? The big idea is that the results of random 
sampling don’t change haphazardly from sample to sample. Because we deliberately 
use chance, the results obey the laws of probability that govern chance behavior. 
These laws allow us to say how likely it is that sample results are close to the truth 
about the population. The second reason to use random sampling is that the laws of 
probability allow trustworthy inference about the population. Results from random 
samples come with a margin of error that sets bounds on the size of the likely error. 
How to do this is part of the technique of statistical inference. We will describe the 
reasoning in Chapter 16 and present details throughout the rest of the book. 

One point is worth making now: larger random samples give more accurate results 
than smaller random samples. By taking a very large sample, you can be confident 
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that the sample result is very close to the truth about the population. The Current 
Population Survey contacts about 60,000 households, so it estimates the national 
unemployment rate very accurately. Opinion polls that contact 1000 or 1500 people 
give less accurate results. Of course, only samples chosen by chance carry this 
guarantee. Lou Dobbs’s online sample tells us little about overall American public 
opinion even though 7350 people clicked a response. 


Apply Your Knowledge 


Fd GOLFING AT 
Lit) RANDOM 


Random drawings give everyone 
the same chance to be chosen, so 
they offer a fair way to decide who 
gets a scarce good—like a round of 
golf. Lots of golfers want to play the 
famous Old Course at St. Andrews, 
Scotland. Some can reserve in 
advance, at considerable expense. 
Most must hope that chance favors 
them in the daily random drawing 
for tee times. At the height of the 
summer season, only one in six wins 
the right to pay $250 for a round. 


8.9 Ask More People. In the 2012 presidential pre-election surveys, Pew Re- 
search sampled 1,112 likely voters during October 4-7, 2012, and asked if 
they were planning to vote for Obama, and then asked the same question of 
a sample of 1,495 likely voters taken from October 24-28, 2012. However, in 
their last survey taken October 31-November 3, 2012, just before the elec- 
tion held on November 6, 2012, they asked this question of a sample of 2,709 
likely voters. Why do you think Pew did this? 


8.10 How Accurate Is the Poll? The New York Times/CBS News poll conducted 
during February 19-23, 2014, included 1644 adults, of which 519 were Re- 
publican, 515 were Democrats, 550 were Independent, and 60 didn’t know or 
didn’t respond.’ Each person sampled was asked their opinion on a variety of 
issues facing the nation, such as, “Do you feel that the distribution of money 
and wealth in this country is fair, or do you feel that the money and wealth 
in this country should be more evenly distributed among more people?” The 
margin of error (we will give more detail in later chapters) was reported as 
+3% for the entire sample. When considering the opinions of only the Re- 
publicans in the sample, the margin of error was reported as +6%. What do 
you think explains the fact that estimates for Republicans were less precise 
than for the entire sample? 


Random sampling, the use of chance to select the sample, is the essential principle 
of statistical sampling. Designs for random sampling from large populations spread 
out over a wide area are usually more complex than an SRS. For example, it is com- 
mon to sample important groups within the population separately, then combine 
these samples. This is the idea of a stratified random sample. 


Stratified Random Sample 


To select a stratified random sample, first classify the population into groups of 
similar individuals, called strata. Then, choose a separate SRS in each stratum and 
combine these SRSs to form the full sample. 


Choose the strata based on facts known before the sample is taken. For example, 
a population of election districts might be divided into urban, suburban, and rural 
strata. A stratified design can produce more precise information than an SRS of 
the same size by taking advantage of the fact that individuals in the same stratum 
are similar to one another. 


Most large-scale sample surveys use a multistage sample. For example, the opinion 
poll described in Example 8.5 has three stages: choose a random sample of telephone 
exchanges (stratified by region of the country), then an SRS of household telephone 
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Each state conducts an annual survey of seat belt use by drivers, following guidelines 
set by the federal government. The guidelines require random sampling. Seat belt 
use is observed at randomly chosen road locations at random times during daylight 
hours. The locations are not an SRS of all locations in the state, but rather a stratified 
sample using the state’s counties as strata. 

In Hawaii, the counties are the islands that make up the state’s territory. The seat 
belt survey sample consists of 135 road locations in the four most populated islands: 
66 in Oahu, 24 in Maui, 23 in Hawaii, and 22 in Kauai. The sample sizes on the 
islands are proportional to the amount of road traffic.* = 


numbers within each exchange, and then a random adult in each household. 


Analysis of data from sampling designs more complex than an SRS takes us 
beyond basic statistics. But the SRS is the building block of more elaborate designs, 
and analysis of other designs differs more in complexity of detail than in fundamen- 


tal concepts. 


Apply Your Knowledge 


8.11 


8.12 


Sampling Metro Chicago. Cook County, Illinois, has the second-largest 
population of any county in the United States (after Los Angeles County, 
California). Cook County has 30 suburban townships and an additional eight 
townships that make up the city of Chicago. The suburban townships are 


Barrington Elk Grove Maine Orland Riverside 
Berwyn Evanston New Trier Palatine Schaumburg 
Bloom Hanover Niles Palos Stickney 
Bremen Lemont Northfield Proviso Thornton 
Calumet — Leyden Norwood Park Rich Wheeling 
Cicero Lyons Oak Park River Forest Worth 


The Chicago townships are 


Hyde Park Lake North Chicago South Chicago 
Jefferson Lake View Rogers Park West Chicago 


Because city and suburban areas may differ, the first stage of a multistage sample 
chooses a stratified sample of five suburban townships and three of the more 
heavily populated Chicago townships. Use software, the Simple Random Sample 
applet, or Table B to choose this sample. (If you use Table B, assign labels in al- 
phabetical order and start at line 116 for the suburbs and at line 126 for Chicago.) 


Academic Dishonesty. A study of academic dishonesty among college 
students used a two-stage sampling design. The first stage chose a sample of 
30 colleges and universities. Then, the study authors mailed questionnaires 
to a stratified sample of 200 seniors, 100 juniors, and 100 sophomores at each 
school.? One of the schools chosen has 1127 freshmen, 989 sophomores, 943 
juniors, and 895 seniors. You have alphabetical lists of the students in each 
class. Explain how you would assign labels for stratified sampling. Then use 
software or Table B, starting at line 140, to select the first five students in 
the sample from each stratum. After selecting five students for a stratum, 
continue to select the students for the next stratum. 


Ryan McVay/Getty Images 
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FIGURE 8.3 

The home page of the General Social 
Survey at the University of Chicago's 
National Opinion Research Center. 
The GSS has tracked opinions about a 
wide variety of issues since 1972. 


Random selection eliminates bias in the choice of a sample from a list of the 
population. When the population consists of human beings, however, accurate 
information from a sample requires more than a good sampling design. 

To begin, we need an accurate and complete list of the population. Because such 
a list is rarely available, most samples suffer from some degree of undercoverage. A 
sample survey of households, for example, will miss not only homeless people but 
also prison inmates and students in dormitories. An opinion poll conducted by call- 
ing landline telephone numbers will miss households that have only cell phones as 
well as households without a phone. The results of national sample surveys therefore 
have some bias if the people not covered differ from the rest of the population. 

A more serious source of bias in most sample surveys is nonresponse, which 
occurs when a selected individual cannot be contacted or refuses to cooperate. 
Nonresponse to sample surveys often exceeds 50%, even with careful planning 
and several callbacks. Because nonresponse is higher in urban areas, most sample 
surveys substitute other people in the same area to avoid favoring rural areas in the 
final sample. If the people contacted differ from those who are rarely at home or 
who refuse to answer questions, some bias remains. 


Undercoverage and Nonresponse 


Undercoverage occurs when some groups in the population are left out of the pro- 
cess of choosing the sample. 


Nonresponse occurs when an individual chosen for the sample can’t be contacted 
or refuses to participate. 


The U.S. Census Bureau's American Community Survey (ACS) has the lowest nonre- 
sponse rate of any poll we know: in 2012, only about 1.2% of the households in the 
sample refused to respond; the overall nonresponse rate, including “never at home” 
and other causes, was just 2.7%.° This monthly survey of almost 300,000 housing 
units replaces the “long form” that in the past was sent to some households in the 
every-10-years national census. Participation in the ACS is mandatory, and the U.S. 
Census Bureau follows up by telephone and then in person if a household fails to 
return the mail questionnaire. 

The University of Chicago’s General Social Survey (GSS) is the nation’s most 
important social science survey. (See Figure 8.3.) The GSS contacts its sample in 
person, and it is run by a university. Its response rates are about 70%, among the 
highest in the world. 


f@ Session 


GSS General Social SULVEY 


The General Social Survey (GSS) conducts basic scientific research on 
the structure and development of American society with a data-collection 
program designed to both monitor social change within the United States 
and to compare the United States to other nations. 


GSS General Social Survey 
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What about opinion polls by news media and opinion-polling firms? We don’t know 
the rates of nonresponse for many of these surveys because they don’t say, which 
in itself is a bad sign. In a December 2013 survey on social media use, the Pew 
Research Center provided a full disposition of the sampled phone numbers. Here are 
the details. Initially, 40,985 landline and 27,000 cell phones were dialed, with 11,260 
of the landlines and 15,758 of the cell numbers being working numbers. Of these 
27,018 working numbers, they were able to contact 17,335, or about 65%, as a large 
portion of the calls went to voicemail. Among those contacted, about 15% cooper- 
ated. Of those cooperating, some numbers were ineligible due to language barriers 
or contacting a child’s cell phone, and some calls were eventually broken off without 
being completed. To sum it up, the 27,018 working numbers dialed resulted in a final 
sample of 1,801, giving a response rate of about 9%, an estimate of the fraction of 
all eligible respondents in the sample who were ultimately interviewed.’ m 


In addition, the behavior of the respondent or of the interviewer can cause 
response bias in sample results. People know that they should take the trouble to —_— response bias 
vote, for example, so many who didn’t vote in the last election will tell an inter- 
viewer that they did. The race or sex of the interviewer can influence responses to 
questions about race relations or attitudes toward feminism. Answers to questions 
that ask respondents to recall past events are often inaccurate because of faulty 
memory. For example, many people “telescope” events in the past, bringing them 
forward in memory to more recent time periods. “Have you visited a dentist in the 
last six months?” will often draw a “yes” from someone who last visited a dentist 
eight months ago.® Careful training of interviewers and careful supervision to avoid 
variation among the interviewers can reduce response bias. Good interviewing 
technique is another aspect of a well-done sample survey. 
Wording effects are the most important influence on the answers given to a _— wording effects 
sample survey. Confusing or leading questions can introduce strong bias, and varia- 
tions in wording can greatly change a survey’s outcome. Even the order in which 
questions are asked matters. Here are some examples.’ 


How do Americans feel about illegal immigrants? “Should illegal immigrants be pros- 
ecuted and deported for being in the United States illegally, or shouldn't they?” Asked 
this question in an opinion poll, 69% favored deportation. But when the very same 
sample was asked whether illegal immigrants who have worked in the United States 
for two years “should be given a chance to keep their jobs and eventually apply 
for legal status,” 62% said that they should. Different questions give quite different 
impressions of attitudes toward illegal immigrants. 

What about government help for the poor? Only 13% think we are spending too much 
on “assistance to the poor,” but 44% think we are spending too much on “welfare.” m 


Ask a sample of college students these two questions: 
“How happy are you with your life in general?” (Answers on a scale of 1 to 5) 
“How many dates did you have last month?” 


The correlation between answers is r = —O0.012 when asked in this order. It appears 
that dating has little to do with happiness. Reverse the order of the questions, how- 
ever, and r = 0.66. Asking a question that brings dating to mind makes dating suc- 
cess a big factor in happiness. = 
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Dont trust the results of a sample survey until you have read the exact ques- 
tions asked. The amount of nonresponse and the date of the survey are also impor- 


tant. Good statistical design is a part, but only a part, of a trustworthy survey. 


Apply Your Knowledge 


‘ity DO NOT CALL! 
i People who do sample 


surveys hate telemarketing. We all 
get so many unwanted sales pitches 
by phone that many people hang 

up before learning that the caller 

is conducting a survey rather than 
selling vinyl siding. You can eliminate 
calls from commercial telemarketers 
by placing your phone number on 
the National Do Not Call Registry. 
Just go to www. donotcall.gov 
to sign up. 


8.13 A Survey of 100,000 Physicians. In 2010, the Physicians Foundation 
conducted a survey of physicians’ attitude about health care reform, calling 
the report “a survey of 100,000 physicians.” The survey was sent to 100,000 
randomly selected physicians practicing in the United States: 40,000 via 
post-office mail and 60,000 via email. A total of 2,379 completed surveys 
were received. !° 


(a) State carefully what population is sampled in this survey and what is 
the sample size. Could you draw conclusions from this study about all 
physicians practicing in the United States? 

(b) What is the rate of nonresponse for this survey? How might this affect 
the credibility of the survey results? 


(c) Why is it misleading to call the report “a survey of 100,000 physicians”? 


8.14 Gays in the Military. In 2010, a Quinnipiac University Poll and a CNN Poll 
each asked a nationwide sample about their views on openly gay men and 
women serving in the military.'' Here are the two questions: 


Question A: Federal law currently prohibits openly gay men and women from 
serving in the military. Do you think this law should be repealed or not? 
Question B: Do you think people who are openly gay or homosexual should or 
should not be allowed to serve in the U.S. military? 


One of these questions had 78% responding “should,” and the other question 
had only 57% responding “should.” Which wording is slanted toward a more 
negative response on gays in the military? Why? 


A few national sample surveys, including the General Social Survey and the U.S. 
government’s American Community Survey and Current Population Survey, inter- 
view some or all of their subjects in person. This is expensive and time-consuming, 
so most national surveys contact subjects by telephone using the random digit dial- 
ing (RDD) method described in Example 8.5. Technology, especially the spread of 
cell phones, is making traditional RDD methods outdated. 

First, call screening is now common. A large majority of American households 
have answering machines, voicemail, or caller ID, and many use these methods to 
screen their calls. Calls from polling organizations are rarely returned. 

More seriously, the percent of cell-phone-only households is increasing rapidly. By 
mid-2007, 14% of American households had a cell phone but no landline phone; by 
the end of 2009, that percent had increased to almost 25%; and in 2012, the percent 
was almost 36%. It’s clear from these numbers that RDD reaching only landline 
numbers is in trouble. Can surveys just add cell phone numbers? Not easily. Federal 
regulations prohibit automated dialing to cell phones, which rules out computerized 
RDD sampling and requires hand dialing of cell phone numbers, which is expen- 
sive. A cell phone can be anywhere, and many people keep their cell number despite 
moving, so stratifying by location becomes difficult. And a cell phone user may be 
driving or otherwise unable to talk safely. 


8.7 The Impact of Technology 


People who screen calls and people who have only a cell phone tend to be 
younger than the general population. In 2012, just over 60% of adults aged 25 to 
29 years lived in households with no landline phone. So RDD surveys using only 
landlines may be biased (see Exercise 8.16). Careful surveys weight their responses 
to reduce bias. For example, if a sample contains too few young adults, the responses 
of the young adults who do respond are given extra weight. But with response rates 
steadily dropping and cell-phone-only use steadily growing, the future of RDD land- 
line telephone surveys is not promising. Many polling organizations now include a 
minimum quota of cell phone users in their samples to help adjust for bias!” (see 
Example 8.5 and Exercise 8.45). 

One alternative is to use web surveys, an increasingly popular survey method, 
rather than telephone surveys. Web surveys have several advantages over more tra- 
ditional survey methods. It is possible to collect large amounts of survey data at lower 
costs than traditional methods allow. Anyone can put survey questions on dedicated 
sites offering free services; thus large-scale data collection is available to almost 
every person with access to the Internet. Furthermore, web surveys allow delivery of 
multimedia survey content to respondents, opening up new realms of survey pos- 
sibilities that would be extremely difficult to implement using traditional methods. 
Some argue that eventually web surveys will replace traditional survey methods. 

Although web surveys are easy to do, they are not easy to do well. Three major 
problems are voluntary response, undercoverage, and nonresponse. Voluntary 
response appears in several forms in online surveys. Example 8.3 is a survey that 
invited individuals to a particular website to participate in a poll. Other web surveys 
solicit participation through announcements in news groups, email invitations, and 
banner ads on high-traffic sites. 


The Renfrew Center Foundation, a nonprofit charitable organization, reports that 44% 
of women have negative feelings when they don’t wear makeup.'* How did they come 
up with this number? It is based on an online survey of 1292 women, 18 years of 
age and older, conducted by Harris Interactive during December 20-22, 2011. This 
is a more sophisticated example of voluntary response. It occurs when the polling 
organization, in this case Harris Interactive, maintains a panel of volunteers. The panel 
is recruited to fill out questionnaires on the Internet, often in exchange for points 
redeemable for cash and gifts. Panelists join by going to the polling company website 
and filling out some personal information later used to select them for specific surveys. 
Harris Interactive uses an online research panel of over 6 million volunteers worldwide, 
from which it selected the sample for the Renfrew Center Foundation poll. 

In its report on online panels,'® the American Association of Public Opinion 
Research states, “Researchers should avoid nonprobability online panels when one of 
the research objectives is to accurately estimate population values. From a total survey 
error perspective, the principal source of error in estimates from these types of sample 
sources is a combination of the lack of Internet access in roughly one in three U.S. 
households and the self-selection bias inherent in the panel recruitment processes.” It 
also states that, “The reporting of a margin of sampling error associated with an opt- 
in or self-identified sample is misleading.” In the methodology section of the poll, the 
Renfew Center Foundation does state, “This online survey is not based on a probability 
sample, and therefore, no estimate of theoretical sampling error can be calculated,” 
although the issue of potential bias in the estimate is not addressed. m 


Undercoverage is a serious problem for even careful web surveys, because about 
25% of Americans lack Internet access and only about 70% have broadband access. 
People without Internet access are more likely to be poor, elderly, minority, or 
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rural than the overall population, so the potential for bias in a web survey is clear. 
There is no easy way to choose a random sample even from people with web access 
because there is no technology that generates personal email addresses at random 
in the way that RDD generates residential telephone numbers, and individuals may 
have several email addresses. Even if such technology existed, etiquette and regula- 
tions aimed at spammers would prevent mass emailing. For the present, web surveys 
work well only for restricted populations, for example, surveying students at your 
university using the school’s list of student email addresses. Here is an example of 
a successful web survey. 


A placebo is a dummy treatment such as a pill that has no direct effect on a patient 
but may bring about a response because patients expect it to. Do academic physi- 
cians who maintain private practices sometimes give their patients placebos? A web 
survey of doctors in internal medicine departments at Chicago-area medical schools 
was possible because almost all the doctors had listed email addresses. 

An email was sent to each doctor, explaining the purpose of the study, promising 
anonymity, and giving an individual web link for response. In all, 231 of 443 doctors 
responded. The response rate was helped by the fact that the email came from a 
team at a medical school. Result: 45% said they sometimes used placebos in their 
clinical practice.'® m 


Apply Your Knowledge 


8.15 NPR Facebook Survey. In 2010, National Public Radio (NPR) conducted 
a survey of preferences and habits of its Facebook fans by recruiting respon- 
dents through messages posted on its Facebook page. The survey was con- 
ducted online and deployed July 12-19. A total of 40,043 respondents began 
the survey, with 33,304 completing all questions. It was found that people 
accessed NPR on the radio, at NPR.org, through iPhone apps, and several 
other platforms. Asked about time spent with NPR, about 20% of respon- 
dents indicated that they spent more than three hours per day, including 
radio listening. 


(a) Here is what NPR says about the survey methodology: “Respondents 
were self-selected and the resulting sample is non-random—therefore 
a margin of error cannot be calculated, and the survey results cannot 
be projected to any population other than the sample itself.”!” Why 
can’t inference about any population be made? 


(b) Suppose that people who spent more time with NPR were more likely 
to respond to the survey. Do you think the true percentage of NPR’s 
Facebook fans who spend more than three hours with NPR is higher or 
lower than the 20% found from the survey? Explain why. 


8.16 More on Random Digit Dialing. In the first half of 2013, about 38% of 
adults lived in households with a cell phone and no landline phone. Among 
adults aged 25 to 29, this percent was about 65%, while among adults over 
65, the percent was only 13%."® 


(a) Write a survey question for which the opinions of adults with landline 
phones only are likely to differ from the opinions of adults with cell 
phones only. Give the direction of the difference of opinion. 


(b) For the survey question in part (a), suppose a survey was conducted 
using random digit dialing of landline phones only. Would the results 
be biased? What would be the direction of bias? 


(c) Most surveys now supplement the landline sample contacted by RDD 
with a second sample of respondents reached through random dial- 
ing of cell phone numbers. The landline respondents are weighted to 
take account of household size and number of telephone lines into the 
residence, whereas the cell phone respondents are weighted according 
to whether they were reachable only by cell phone or also by landline. 
Explain why it is important to include both a landline sample and a 
cell phone sample. Why is the number of telephone lines into the resi- 
dence important? (Hint: How does the number of telephone lines into 
the residence affect the chance of the household being included in the 
RDD sample?) 


CHAPTER 8 SUMMARY 


Chapter Specifics 


= A sample survey selects a sample from the population of all individuals about which 
we desire information. We base conclusions about the population on data from the 
sample. It is important to specify exactly what population you are interested in and 
what variables you will measure. 


m The design of a sample describes the method used to select the sample from the popu- 
lation. Random sampling designs use chance to select a sample. 


mu The basic random sampling design is a simple random sample (SRS). An SRS gives 
every possible sample of a given size the same chance to be chosen. 


m= Choose an SRS by labeling the members of the population and using random digits to 
select the sample. Software can automate this process. 


m To choose a stratified random sample, classify the population into strata, groups of 
individuals that are similar in some way that is important to the response. Then choose 
a separate SRS from each stratum. 


m Failure to use random sampling often results in bias, or systematic errors in the way the 
sample represents the population. Voluntary response samples, in which the respon- 
dents choose themselves, are particularly prone to large bias. 


= In human populations, even random samples can suffer from bias due to undercover- 
age or nonresponse, from response bias, or from misleading results due to poorly 
worded questions. Sample surveys must deal expertly with these potential problems in 
addition to using a random sampling design. 


= Most national sample surveys are carried out by telephone, using random digit dialing 
to choose residential telephone numbers at random. Because call screening is increas- 
ing nonresponse to such surveys, and the rise of cell-phone-only households is increas- 
ing undercoverage, many surveys include a minimum quota of cell phone users in their 
samples to help adjust for bias. Web surveys are becoming more frequent, but many 
suffer from volunteer response, undercoverage, and nonresponse. 


Link It 


The methods of Chapters | to 6 can be used to describe data regardless of how the data 
were obtained. However, if we want to reason from data to give answers to specific ques- 
tions or to draw conclusions about the larger population, then the method that was used to 
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collect the data is important. Sampling is one way to collect data, but it does not guarantee 
that we can draw meaningful conclusions. Biased sampling methods, such as convenience 
sampling and voluntary response samples, produce data that can be misleading, resulting 
in incorrect conclusions. Simple random sampling avoids bias and produces data that can 
lead to valid conclusions regarding the population. Even with perfect sampling methods, 
there is still sample-to-sample variation; we will begin our study of the connection between 
sampling variation and drawing conclusions in Chapter 15. 

Even when we take a simple random sample, our conclusions can be weakened by un- 
dercoverage, nonresponse, and poor wording of questions. Careful attention must be given 
to all aspects of the sampling process to ensure that the conclusions we make are valid. In 
some cases, more complex designs are required, such as stratified sampling or multistage 
sampling. But the use of impersonal chance to select the sample remains a key ingredient 
in the sampling process. And issues such as undercoverage and nonresponse still remain for 
these more complex designs. 

The next chapter is about statistical designs for experiments, a quite different means of 
producing data than sample surveys. 


() LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


* StatTutor starts with a video review of each section and asks a series of questions 
to check your understanding. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 


CHECK YOUR SKILLS 
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An online store contacts 1000 customers from its list 
of customers who have purchased in the last year. In 
all, 696 of the 1000 say that they are very satisfied with 
the store’s website. The population in this setting is 


(a) all customers who have purchased something in 
the last year. 

(b) the 1000 customers contacted. 

(c) the 696 customers who were very satisfied with 
the store’s website. 


A state representative wants to know how voters in his 
district feel about enacting a statewide smoking ban in 
all enclosed public places, including bars and restau- 
rants, as well as several other current statewide issues. 
He mails a questionnaire addressing these issues to an 
SRS of 800 voters in his district. Of the 800 question- 
naires mailed, 152 were returned. The sample is 


(a) the 800 voters receiving the questionnaire. 
(b) the 152 voters returning the questionnaire. 
(c) all voters in his district. 


In the survey for Exercise 8.18, there are 8741 
registered voters in his district. You label the voters 
0001 to 8741 in alphabetical order. Using line 123 of 
Table B to select the sample, the first five voters in 
your sample would be 


(a) 5458, 0815, 7271, 2560, 2755. 
(b) 5458, 0815, 0727, 1025, 6027. 
(c) 5458, 8150, 7271, 2560, 2755. 


The website www. twiigs.com allows you to vote on 
polls that interest you or to post one of your own. Once 
you have found a poll of interest, you just click on “Vote,” 
and your response becomes part of the sample. One of 
the questions in July 2010 was, “How many times have 
you been pulled over by the police?” Of the 780 people 
responding, 70% said “1—5 times.” You can conclude that 


(a) about 70% of Americans have been pulled over 
by the police “1—5 times.” 

(b) the poll uses voluntary response, so the results tell 
us little about the population of all adults. 

(c) more people still need to vote on the question, as 
a larger sample is required to reduce bias. 
Archaeologists plan to examine a sample of two-meter- 
square plots near an ancient Greek city for artifacts 
visible in the ground. They choose separate samples 


In all exercises asking for an SRS, you may use software, the Simple 
Random Sample applet, or Table B. 
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Environmental problems. A Gallup Poll found that 
Americans, when asked about a list of environmental 
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of plots from floodplain, coast, foothills, and high hills. 
What kind of sample is this? 


(a) A simple random sample 
(b) A stratified random sample 
(c) A voluntary response sample 


You must choose an SRS of 10 of the 440 retail outlets in 
New York that sell your company’s products. How would 
you label this population to select a simple random sample? 


(a) 001, 002, 003,..., 439, 440 

(b) 000, 001, 002,..., 439, 440 

(c) 1,2,..., 439, 440 

You are using the table of random digits to choose a 
simple random sample of six students from a class of 
30 students. You label the students 01 to 30 in alphabetical 
order, and then select a simple random sample. Which of 
the following is a possible sample that could be obtained? 


(a) 45, 74, 04, 18, 07, 65 
(b) 04, 18, 07, 13, 02, 07 
(c) 04, 18, 07, 13, 02, 05 


A sample of households in a community is selected at 
random from the telephone directory. In this commu- 
nity, 4% of households have no telephone, 10% have 
only cell phones, and another 25% have unlisted tele- 
phone numbers. The sample will certainly suffer from 


(a) nonresponse. 
(b) undercoverage. 
(c) false responses. 


The Pew Research Centers Report entitled “How 
Americans value public libraries in their communities,” 
released December 11, 2013, asked a random sample 
of 6224 Americans aged 16 and over, “Have you used 
a Public Library website in the last 12 months?” In the 
entire sample, 30% said “yes.” But only 17% of those 
in the sample over 65 years of age said “yes.” Which of 
these two sample percents will be more accurate as an 
estimate of the truth about the population? 


(a) The result for those over 65 is more accurate 
because it is easier to estimate a proportion for a small 
group of people. 

(b) The result for the entire sample is more accurate 
because it comes from a larger sample. 

(c) Both are equally accurate because both come 
from the same sample. 


problems, were increasingly worried about problems 
such as pollution of drinking water, soil, air, and wa- 
terways, but were least worried about climate change 
or global warming. Gallup’s report said, “Results are 
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based on telephone interviews conducted March 6-9, 
2014, with a random sample of 1,048 adults, aged 18 
and older, living in all 50 U.S. states and the District 
of Columbia.” What is the population for this sample 
survey? What is the sample? 


Sampling stuffed envelopes. A large retailer prepares 
its customers’ monthly credit card bills using an au- 
tomatic machine that folds the bills, stuffs them into 
envelopes, and seals the envelopes for mailing. Are 
the envelopes completely sealed? Inspectors choose 40 
envelopes from the 1000 stuffed each hour for visual 
inspection. What is the population for this sample sur- 
vey? What is the sample? 


Stormwater pollution. The administrators of the city’s 
Stormwater Program are interested in evaluating the 
public’s knowledge of the causes and effects of stormwa- 
ter pollution. They set up an information booth at the 
city’s Earth Day celebration, and people who visit the 
booth are offered the opportunity to complete a survey 
on their knowledge of stormwater pollution. 


(a) If the population of interest is adult residents of 
the city, discuss any sources of bias that might limit 
the usefulness in using this survey to draw conclu- 
sions about this population. 

(b) The survey at the Earth Day celebration was 
intended to be used as a pilot study to make sure the 
questions were not ambiguous. The full survey is to 
be an online survey. Residents are to be made aware 
of the survey through information mailed with their 
utility bills, and asked to go online to complete the 
survey. Discuss any sources of bias that might be 
contained in the full survey. 

(c) Suppose we select a random sample of utility 
customers. Only the residents in the sample are to be 
made aware of the survey through information mailed 
with their utility bills, and asked to go online to com- 
plete the survey. Does this have any advantages or 
disadvantages over the full survey in part (b) in terms 
of possible bias? Explain. 


Sampling telephone area codes. The United States 
currently has approximately 287 Numbering Plan Ar- 
eas (NPAs) in service, corresponding to geographic 
regions. Each NPA is identified by a three-digit code, 
commonly called an area code. (More are created reg- 
ularly.)?° You want to choose an SRS of 20 of these 
area codes for a study of available telephone numbers. 
Label the codes 001 to 287, and use the Simple Random 
Sample applet or other software to choose your sample. 
(If you use Table B, start at line 122 and choose only 
the first five area codes in the sample.) 


Paying taxes. In April 2014, a Gallup Poll asked two 
questions about the amount one pays in federal in- 
come taxes.”! Here are the two questions: 
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Question A: Do you regard the income tax which 
you will have to pay this year as fair? 


Question B: Do you consider the amount of federal 
income tax you have to pay as too high, about right, 
or too low? 


One of these questions drew 52% saying the amount 
was fair or about right; the other, only 41%. Which 
wording produced the higher percentage? Why? 


Wording of questions. Do the opinions of Canadians 
and Americans differ on the right of citizens to bear 
arms? Although the population of the U.S. is about 
10 times that of Canada, the estimated number of pri- 
vately owned firearms is 25 times higher in the U.S. 
When Gallup asked a random sample of 1002 American 
adults the question, “Do you think there should or 
should not be a law that would ban the possession of 
handguns, except by the police and other authorized 
persons?”, 35% said there should be a law banning pos- 
session of handguns. At the same time, when Gallup 
asked a random sample of 1011 Canadians the ques- 
tion, “Do you think the general public should be al- 
lowed by law to own a gun?”, 63% said “no.” What 
problems do you see in using the percentages in these 
two surveys to make a comparison on the opinion on 
the right to bear arms in the two countries? Explain. 


Movie viewing. An opinion poll calls 2000 ran- 
domly chosen residential telephone numbers and 
asks to speak with an adult member of the house- 
hold. The interviewer asks, “How many movies 
have you watched in a movie theater in the past 
12 months?” 


(a) What population do you think the poll has in mind? 
(b) In all, 831 people respond. What is the rate (percent) 
of nonresponse? 

(c) What source of response error is likely for the 
question asked? 


Online polls. Example 8.3 reports an online poll in 
which 97% of the respondents opposed issuing driver’s 
licenses to illegal immigrants. National random sam- 
ples taken at the same time showed about 70% of the 
respondents opposed to such licenses. Explain briefly 
to someone who knows no statistics why the random 
samples report public opinion more reliably than the 
online poll. 


Nonresponse. Academic sample surveys, unlike com- 
mercial polls, often discuss nonresponse. A survey of 
drivers began by randomly sampling all listed residen- 
tial telephone numbers in the United States. Of 45,956 
calls to these numbers, 5029 were completed.” What 
was the rate of nonresponse for this sample? (Only one 
call was made to each number. Nonresponse would be 
lower if more calls were made.) 


8.35 Running red lights. The sample described in 
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Exercise 8.34 produced a list of 5024 licensed 
drivers. The investigators then chose an SRS of 
880 of these drivers to answer questions about their 
driving habits. 


(a) How would you assign labels to the 5024 drivers? 
Choose the first five drivers in the sample. If you use 
Table B, start at line 114. 

(b) One question asked was, “Recalling the last ten 
traffic lights you drove through, how many of them 
were red when you entered the intersections?” Of the 
880 respondents, 171 admitted that at least one light 
had been red. A practical problem with this survey 

is that people may not give truthful answers. What 

is the likely direction of the bias: do you think more 
or fewer than 171 of the 880 respondents really ran a 
red light? Why? 


Seat belt use. A study in El Paso, Texas, looked at seat 
belt use by drivers. Drivers were observed at randomly 
chosen convenience stores. After they left their cars, 
they were invited to answer questions that included 
questions about seat belt use. In all, 75% said they al- 
ways used seat belts, yet only 61.5% were wearing seat 
belts when they pulled into the store parking lots.” 
Explain the reason for the bias observed in responses 
to the survey. Do you expect bias in the same direction 
in most surveys about seat belt use? 


Sampling at a party. At a large block party, there 
are 40 men and 30 women. You want to ask opinions 
about how to improve the next party. You choose at 
random four of the men and separately choose at ran- 
dom three of the women to interview. 


(a) What is the probability that any of the 40 men is 

in your random sample of four men to be interviewed? 
What is the probability that any of the 30 women is in 
your random sample of three women to be interviewed? 
(b) If you have done the calculations correctly in 
part (a), the probability of any person at the party 
being interviewed is the same. Why is your sample 
of seven men and women not an SRS of people from 
the party? 


Ring-no-answer. A common form of nonresponse in 
telephone surveys is “ring-no-answer.” That is, a call 
is made to an active number but no one answers. The 
Italian National Statistical Institute looked at nonre- 
sponse to a government survey of households in Italy 
during the periods January 1 to Easter and July 1 to 
August 31. All calls were made between 7 P.M. and 
10 p.m., but 21.4% gave “ring-no-answer” in one 
period versus 41.5% “ring-no-answer” in the other 
period.”> Which period do you think had the higher 
rate of no answers? Why? Explain why a high rate of 
nonresponse makes sample results less reliable. 
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Retweeters. Twitter and Compete, a marketing ser- 
vices company, conducted a survey to investigate 
some of the characteristics of those who retweet (re- 
posting someone else’s tweet). Among other findings, 
it was found that Twitter users who retweet are de- 
mographically similar to those who don’t, use Twitter 
more often during the day, and are more likely to use 
Twitter on a mobile phone. Here is the methodology 
section contained with the survey results: 


The findings are based on data from surveys 
fielded in the United States during 2012. Twitter 
and Compete worked together to build a ques- 
tionnaire that asked respondents about their 
propensity to use Twitter and other services as 
well as the when, where, how and why of their 
usage patterns. Compete interviewed 655 Inter- 
net users in the U.S. for this study.”¢ 


(a) Explain in simple language why it is important to 
know how the sample was selected when drawing 
conclusions about a survey. 

(b) Do you feel that the methodology section ade- 
quately explains how this sample was selected? If not, 
what information is lacking and why is it important? 


Sampling pharmacists. All pharmacists in the Ca- 
nadian province of Ontario are required to be mem- 
bers of the Ontario College of Pharmacists. In 2013, 
there were 13,317 members of the college divided 
geographically into six voting districts. The number of 
members in each district follow:?’ 


District H K IC M N P 
Membership 2013 1602 4316 2771 1928 687 


Suppose the college is interested in obtaining members’ 
views and understanding of the 2012 Expanded Scope 
of Practice Regulations, which authorizes pharmacists 
to provide additional services including prescribing 
drug products for smoking cessation and administering 
the publicly funded influenza vaccine. To be sure that 
the opinions of all districts are represented, you choose 
a stratified random sample of five pharmacists from 
each district. Explain how you will assign labels within 
each district, and then give the label numbers for the 
five pharmacists in each of Districts H and K who 
will be part of your sample. Use software or Table B. 
If you use Table B, start at line 131 for District H and 
at line 124 for District K. Why should you not start 
at the same line in Table B to obtain your samples for 
Districts H and K? 


Sampling Amazon forests. Stratified samples are 
widely used to study large areas of forest. Based on 
satellite images, a forest area in the Amazon basin is 
divided into 14 types. Foresters studied the four most 
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commercially valuable types: alluvial climax forests 
of quality levels 1, 2, and 3, and mature secondary 
forest. They divided the area of each type into large 
parcels, chose parcels of each type at random, and 
counted tree species in a 20- by 25-meter rectangle 
randomly placed within each parcel selected. Here is 
some detail: 


FOREST TOTAL SAMPLE 
TYPE PARCELS SIZE 
Climax 1 36 a 
Climax 2 72 7 
Climax 3 31 3 
Secondary 42 4 


Choose the stratified sample of 18 parcels. Be sure to 
explain how you assigned labels to parcels. If you use 
Table B, start at line 112. 


Canadian health care survey. The Tenth Annual 
Health Care in Canada Survey is a survey of the opin- 
ions of the Canadian public and health care providers 
on a variety of health care issues, including quality of 
health care, access to health care, health and the en- 
vironment, and so forth. A description of the survey 
follows: 


The 10th edition of the Health Care in Canada 
Survey was conducted by POLLARA Research 
between October 3rd and November 8th, 2007. 
Results for the survey are based on telephone in- 
terviews with nationally representative samples 
of 1,223 members of the Canadian public, 202 
doctors, 201 nurses, 202 pharmacists and 201 
health managers. Public results are considered 
to be accurate within +2.8%, while the margin 
of error for results for doctors, nurses, pharma- 
cists and managers is +6.9%.** 


(a) Why is the accuracy greater for the public than 
for health care providers and managers? 

(b) Why do you think they sampled the public as well 
as health care providers and managers? 
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Systematic random samples. Systematic random sam- 
ples go through a list of the population at fixed intervals 
from a randomly chosen starting point. For example, a 
study of dating among college students chose a system- 
atic sample of 200 single male students at a university 
as follows.”? Start with a list of all 9000 single male 
students. Because 9000/200 = 45, choose one of the 
first 45 names on the list at random and then every 
45th name after that. For example, if the first name 
chosen is at position 23, the systematic sample consists 
of the names at positions 23, 68, 113, 158, and so on 
up to 8978. 


(a) Choose a systematic random sample of five names 
from a list of 200. If you use Table B, enter the table 
at line 120. 

(b) Like an SRS, a systematic sample gives all indi- 
viduals the same chance to be chosen. Explain why 
this is true, then explain carefully why a systematic 
sample is nonetheless not an SRS. 


More on systematic sampling. Foresters were inter- 
ested in studying a remote sensing measure of stand- 
ing timber as an alternative to taking measurements 
on the ground. The study area was a 1200 acre pine 
forest in Louisiana on which the U.S. Forest Service 
first created a grid of 1410 equally spaced circular 
plots of 0.05 acre in size over a map of the forest. The 
ground survey then visited every 10th plot and took 
measurements of tree volume.*° 


(a) Assuming the plots are numbered from 1 to 
1410, using the information in Exercise 8.43, 
explain how you will select a systematic sample of 
every 10th plot. 

(b) Now choose a systematic random sample of 141 
plots for the ground survey. If you use Table B, start 
at line 125. 


Why random digit dialing is common. The list of in- 
dividuals from which a sample is actually selected is 
called the sampling frame. Ideally, the frame should list 
every individual in the population, but in practice this 
is often difficult. A frame that leaves out part of the 
population is a common source of undercoverage. 


(a) Suppose that a sample of households in a commu- 
nity is selected at random from the telephone direc- 
tory. What households are omitted from this frame? 
What types of people do you think are likely to live 
in these households? These people will probably be 
underrepresented in the sample. 

(b) It is usual in telephone surveys to use random 
digit dialing equipment that selects the last four 
digits of a telephone number at random after being 
given the exchange (the first three digits), as 
described in Example 8.5. Which of the households 
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that you mentioned in your answer to part (a) will be 
included in the sampling frame by random digit dialing? 


Alternative energy favored. A Gallup poll finds that, 
among all Americans, 64% prefer an emphasis on the 
development of alternative energy production, such as 
wind and solar power, to an emphasis on production 
of traditional fossil fuels. This preference is strongest 
among young Americans, with older age groups each 
successively favoring it less. According to the survey 
methods section, “Results for this Gallup poll are 
based on telephone interviews conducted March 6-9, 
2014, with a random sample of 1,048 adults, aged 18 
and older, living in all 50 U.S. states and the District of 
Columbia. Interviews are conducted with respondents 
on landline telephones and cellular phones. Landline 
and cellphone numbers are selected using random- 
digit-dial methods. Landline respondents are chosen at 
random within each household on the basis of which 
member had the most recent birthday.”*! 


(a) The survey wants the opinion of an individual 
adult, but a landline phone reaches a household in 
which several adults may live. In that case, the survey 
interviewed the adult with the most recent birthday. 
Why is this preferable to simply interviewing the 
person who answers the phone? 

(b) What is the population that this survey wants to 
describe? Why do you think it is important to include 
both landline and cellular phones in your sample? 
Wording survey questions. Comment on each of the 
following as a potential sample survey question. Is the 
question sufficiently clear? Is it slanted toward a de- 
sired response? 

(a) “In light of skyrocketing gasoline prices, we 
should consider opening up a very small amount of 
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Alaskan wilderness for oil exploration as a way of 
reducing our dependence on foreign oil. Do you agree 
or disagree?” 

(b) “Do you agree that a national system of health 
insurance should be favored because it would provide 
health insurance for everyone and would reduce 
administrative costs?” 

(c) “In view of the negative externalities in parent la- 
bor force participation and pediatric evidence associ- 
ating increased group size with morbidity of children 
in day care, do you support government subsidies for 
day care programs?” 


Your own bad questions. Write your own examples of 
bad sample survey questions. 


(a) Write a biased question designed to get one an- 
swer rather than another. 

(b) Write a question to which many people may not 
give truthful answers. 


The Canadian census. The Canadian government’s 
decision to eliminate the mandatory long-form ver- 
sion of the census and to move these questions to an 
optional survey has many concerned. Many members 
of the business community and economists stressed 
the importance of the census data for crafting public 
policy. The minister of industry was given the task of 
defending the government’s decision. In response to 
an argument that making the long form of the census 
voluntary would skew the data by eliminating the sta- 
tistical randomness of the survey, the minister replied: 
“Wrong. Statisticians can ensure validity with a larger 
sample size.”*” Is the minister correct? If not, explain in 
simple terms the error in his statement. 


Poor survey designs. The website for the American Association for Public 
Opinion Research discusses several issues about polls. This information can be 
found at www. aapor.org. Under Resources & Education, click on the link Poll 
& Survey FAQ. Click on the link What Information Should Survey Researchers 
Disclose, and read Section A for recommendations about the information that 
should be available about how the poll was conducted. After reading Section A, go 
back to the previous web page and click on the link What Is a Random Sample? 
and then the link Bad Samples for some examples of flawed samples. 


(a) You are going to design a survey at your university. Give a question of 
interest and two examples of bad ways to collect your sample, along with the likely 


direction of bias that would result. Explain your answers. 
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(b) How would you modify your examples in part (a) to produce a better 
sample? What are some difficulties you might encounter when collecting your 
sample? 


Find a survey. The website for the Pew Research Center for the People and 
the Press is www. people-press.org. Go to the website and read one of the 
featured surveys. What information under Section A of What Information Should 
Survey Researchers Disclose from Exercise 8.50 is included in the featured survey 
you have read? You may find the concluding section entitled About the Survey 
and some of the links at the end of the featured survey helpful for answering this 
question. 


Ambrophoto/Shutterstock 


Producing Data: 
Experiments 


sample survey aims to gather information about a population without 

disturbing the population in the process. Sample surveys are one kind of 

observational study. Other observational studies observe the behavior of ani- 
mals in the wild or the interactions between teacher and students in the classroom. 
This chapter is about statistical designs for experiments, a quite different means of 
producing data. 


In contrast to observational studies, experiments don’t just observe individuals or 
ask them questions. They actively impose some treatment to observe the response. 
Experiments can answer questions such as, “Does aspirin reduce the chance of a 
heart attack?” and “Do a majority of college students prefer Pepsi to Coke when 
they taste both without knowing which they are drinking?” 


CRAPTER 
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wistio'S 


‘ i ersus Experiment 
lt YOU JUST DON’T Observation V p 
UNDERSTAND An observational study observes individuals and measures variables of interest, 
A sample survey of journalists but does not attempt to influence the responses. The purpose of an observational 


and scientists found quite a 


eee ; study is to describe some group or situation. 
communication gap. Journalists think 


that scientists are arrogant, and An experiment, on the other hand, deliberately imposes some treatment on indi- 
scientists think that journalists are viduals to observe their responses. The purpose of an experiment is to study whether 
ignorant. We won't take sides, but the treatment causes a change in the response. 


here is one interesting result from the 
survey: 82% of the scientists agree 
that the “media do not understand 


statistics well enough to explain An observational study, even one based on a statistical sample, is a poor way to 
new findings” in medicine and other gauge the effect of a treatment. To see the response to a change, we must actually 
fields. impose the change. When our goal is to understand cause and effect, experiments are 


the only source for fully convincing data. For this reason, the distinction between 
observation and experiment is one of the most important in statistics. 


In laparoscopic surgery, a video camera and several thin instruments are inserted into 
the patient’s abdominal cavity. The surgeon uses the image from the video camera 
positioned inside the patient’s body to perform the procedure by manipulating the 
instruments that have been inserted. Because of the similarity in many of the skills 
involved in video games and laparoscopic surgery, it was hypothesized that surgeons 
with greater prior video game experience might acquire the skills required in lapa- 
roscopic surgery more easily. Thirty-three surgeons participated in the study and 
were classified into the three categories—never used, under three hours per day, 
and more than three hours per day—depending on the number of hours they played 
video games at the height of their video game use. It was found that those with more 
video game playing did better in a simulator program measuring laparoscopic skills. 
However, in their conclusions the authors correctly point out, “This is a correlational 
(observational) study and, therefore, causality cannot be definitely determined.”' 
Although the data showed a clear association between prior video game experience 
and improved scores in the simulator program, we cannot conclude that more video 
game playing caused the improvement. People who play more video games may be 
different from those who don't, in terms of both their interest and the natural skills 
that are required in video games. Those who played more video games may have 
scored better simply because they have more ability in areas such as fine motor skills, 
eye-hand coordination, and depth perception that are required in both video games 
and laparoscopic surgery. 

It is easy to imagine an experiment that would settle the issue of whether playing 
video games really causes improvement in laparoscopic skills. Choose half of a group 
of surgeons at random to be the “treatment” group. The remaining half becomes the 
“control” group. Require the treatment group to play video games on a regular basis 
for several weeks and require the control group to abstain from video games. This 
experiment isolates the effect of playing video games. See Exercise 9.8 for a description 
of such an experiment. m 


The point of Example 9.1 is the contrast between observing people who chose for 
themselves how many hours of video games to play, and an experiment that requires 
some people to play video games and others to abstain. When we simply observe 
people’s video game choices, the effect of choosing to play more video games is 
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confounded with (mixed up with) the characteristics of people who choose to play 
more. These characteristics are lurking variables (see page 147) that make it hard to 
see the true relationship between the explanatory and response variables. Figure 9.1 
shows the confounding in picture form. 


FIGURE 9.1 

Confounding: We can't distinguish the 
effects of previous video game experi- 
ence from the effects of interest and 


2 
CAUSE? natural skills. 


—___ > 


/ 


Confounding 


Two variables (explanatory variables or lurking variables) are confounded when 
their effects on a response variable cannot be distinguished from each other. 


because the explanatory variable is confounded with lurking variables. Well- 
designed experiments take steps to prevent confounding. 


A Observational studies of the effect of one variable on another often fail 


(3) LaunchPad Online Resources 


* The Snapshots video, Types of Studies, and the StatClips video, Types of 
Studies, both review the important distinctions between observational studies 
and experiments. 


Apply Your Knowledge 


9.1 Drink a Little, but Not a Lot. Many observational studies show that people 
who drink a moderate amount of alcohol have less heart disease than people 
who drink no alcohol or who drink heavily.’ (“Moderate” means one or two 
drinks a day for men and one drink a day for women.) 


(a) What are the explanatory and response variables? 

(b) Moderate drinkers as a group tend to be more likely to maintain a 
healthy weight, get enough sleep, and exercise regularly than those 
who drink no alcohol or drink heavily. Are the variables maintaining 
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a healthy weight, getting enough sleep, and exercising regularly 
explanatory variables, response variables, or lurking variables? 
Explain your reason. 


(c) Is the association between drinking and heart disease good reason to 
think that moderate drinking actually causes less heart disease? Explain 
why or why not? 

9.2 The Font Matters! In general, when trying to change your behavior, if the 
effort required is perceived as high, this will be an impediment to change, 
whether it is modifying your diet or your study habits. Researchers divide 
40 students into two groups of 20. The first group reads instructions for an 
exercise program printed in an easy-to-read font (Arial, 12 point), and the 
second group reads identical instructions in a difficult-to-read font (Brush, 
12 point). Each subject estimates how many minutes the program will take 
and also uses a seven-point rating scale to report whether they are likely to 
include the exercise program as part of their daily routine (7 = very likely). 
The researchers hypothesize that those reading about the exercise program 
in the more difficult-to-read font would estimate that the program would 
take longer and they would be less likely to make the exercise program part of 
their regular routine.’ Is this an experiment? Why or why not? What are the 
explanatory and response variables? 


9.3 Quitting Smoking and Risk for Type 2 Diabetes. Researchers studied a 
group of 10,892 middle-aged adults over a period of nine years. They found 
that smokers who quit had a higher risk of diabetes within three years of 
quitting than either nonsmokers or continuing smokers.* Does this show that 
stopping smoking causes the short-term risk for type 2 diabetes to increase? 
(Weight gain has been shown to be a major risk factor for developing type 
2 diabetes and is often a side effect of quitting smoking.) Based on this re- 
search, should you tell a middle-aged adult who smokes that stopping smok- 
ing can cause diabetes and advise him or her to continue smoking? Carefully 
explain your answers to both questions. 
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A study is an experiment when we actually do something to people, animals, or 
objects to observe the response. Because the purpose of an experiment is to reveal 
the response of one variable to changes in other variables, the distinction between 
explanatory and response variables is essential. Here is the basic vocabulary of 
experiments. 


Subjects, Factors, and Treatments 


The individuals studied in an experiment are often called subjects, particularly 
when they are people. 

The explanatory variables in an experiment are often called factors. 

A treatment is any specific experimental condition applied to the subjects. If an 
experiment has more than one factor, a treatment is a combination of specific values 
of each factor. 
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Do abandoned children placed in foster homes do better than similar children placed 
in an institution? The Bucharest Early Intervention Project found that the answer is 
a clear “yes.” The subjects were 136 young children abandoned at birth and living 
in orphanages in Bucharest, Romania. Half of the children, chosen at random, were 
placed in foster homes. The other half remained in the orphanages. The experiment 
compared these two treatments. There is a single factor, “type of care,” with two val- 
ues, foster and institutional care. When there is only one factor, the levels or values of 
the factor correspond to the treatments. The response variables included measures of 
mental and physical development.° (Foster care was not easily available in Romania at 
the time and so was paid for by the study. Exercise 10.22 on page 265 in the Data 
Ethics chapter for ethical questions concerning this experiment.) m 


What are the effects of repeated exposure to an advertising message? The answer 
may depend on both the length of the ad and how often it is repeated. An experiment 
investigated this question using undergraduate students as subjects. All subjects 
viewed a 40-minute television program that included ads for a digital camera. Some 
subjects saw a 30-second commercial; others, a 90-second version. The same com- 
mercial was shown either one, three, or five times during the program. 

This experiment has two factors: length of the commercial, with two values, and 
repetitions, with three values. The six combinations of one value of each factor form 
six treatments. Figure 9.2 shows the layout of the treatments. After viewing, all the 
subjects answered questions about their recall of the ad, their attitude toward the 
camera, and their intention to purchase it. These are the response variables. = 


Subjects assigned to treatment 


Factor B 
Repetitions 3 see a 30-second ad five 


times during the program. 


1 time 3 times 5 times 
30 1 2 3 
seconds 
FactorA 
Length FIGURE 9.2 
90 The treatments in the experimental 
seconds . : design of Example 9.3. Combinations 
of values of the two factors form six 
treatments. 


Examples 9.2 and 9.3 illustrate the advantages of experiments over observational 
studies. In an experiment, we can study the effects of the specific treatments we 
are interested in. By assigning subjects to treatments, we can avoid confounding. 
For example, observational studies of the effects of foster homes versus institutions 
on the development of children have often been biased because healthier or more 
alert children tend to be placed in homes. The random assignment in Example 9.2 
eliminates bias in placing the children. Moreover, we can control the environment 
of the subjects to hold constant factors that are of no interest to us, such as the 
specific product advertised in Example 9.3. 

Another advantage of experiments is that we can study the combined effects 
of several factors simultaneously. The interaction of several factors can produce 
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effects that could not be predicted from looking at the effect of each factor alone. 
Perhaps longer commercials increase interest in a product, and more commercials 
also increase interest, but if we both make a commercial longer and show it more 
often, viewers get annoyed and their interest in the product drops. The two-factor 
experiment in Example 9.3 will help us find out. 


(4) LaunchPad Online Resources 


* The StatBoards video, Factors and Treatments, identifies subjects, factors, 
treatments, and response variables in additional experiments. 


Apply Your Knowledge 


For the experiments described in Exercises 9.4 and 9.5, identify the subjects, the factors, 
the treatments, and the response variables. 


9.4 Adolescent Obesity. Adolescent obesity is a serious health risk affecting 
more than 5 million young people in the United States alone. Laparoscopic 
adjustable gastric banding has the potential to provide a safe and effective 
treatment. Fifty adolescents between 14 and 18 years old with a body mass 
index higher than 35 were recruited from the Melbourne, Australia, com- 
munity for the study. Twenty-five were randomly selected to undergo gas- 
tric banding, and the remaining 25 were assigned to a supervised lifestyle 
intervention program involving diet, exercise, and behavior modification. All 
subjects were followed for two years, and their weight loss was recorded.° 


9.5 Reactions to Simulated News Reports. A sample of University of Colo- 
rado students each viewed one of two simulated news reports about a ter- 
rorist bombing against the United States by a fictitious country. One report 
showed the bombing attack on a military target and the other on a cultural/ 
educational site. In addition, before viewing the news report, each student 
read one of two “primes.” The first was a prime for forgiveness based on the 
biblical saying “Love thy enemy,” and the second was a retaliatory prime based 
on the biblical saying “An eye for an eye, and a tooth for a tooth.” After view- 
ing the news report, the students were asked to rate ona scale of 1 to 12 what 
the U.S. reaction should be, with the lowest score (1) corresponding to the 
United States sending a special ambassador to the country and the highest 
score (12) corresponding to an all-out nuclear attack against the country.’ 


9.6 Ripening Mangoes. The mango is considered the “king of fruits” in many 
parts of the world. Mangoes are generally harvested at the mature green 
stage and ripen during the marketing process of transport, storage, and so 
on. During this process, about 30% of the fruit is wasted. Because of this, the 
impact of harvest stage and storage conditions on the postharvest quality are 
of interest. In this experiment, the fruit was harvested at 80, 95, or 110 days 
after the fruit setting (the transition from flower to fruit) and then stored at 
temperatures of 20, 30, or 40 degrees centigrade. For each harvest time and 
storage temperature, a random sample of mangoes was selected, and the time 
to ripening was measured.§ 


(a) What are the factors, the treatments, and the response variables? Use 
a diagram like Figure 9.2 to display the factors and treatments. 


(b) For simplicity, the experimenter thought about selecting nine mango 
trees and randomly assigning one tree to each of the nine treatments. 
All the mangoes on a tree would then receive the same treatment. 
Do you think this would be a good way to assign the mangoes to the 
treatments? Explain your reasoning in simple English. 
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Experiments are the preferred method for examining the effect of one variable on 
another. By imposing the specific treatment of interest and controlling other influ- 
ences, we can pin down cause and effect. Statistical designs are often essential for 
effective experiments. To see why, let’s look at an example in which an experiment 
suffers from confounding just as observational studies do. 


A college regularly offers a review course to prepare candidates for the Graduate 
Management Admission Test (GMAT), which is required by most graduate business 
schools. This year, it offers only an online version of the course. The average GMAT 
score of students in the online course is 10% higher than the longtime average for 
those who took the classroom review course. Is the online course more effective? 

This experiment has a very simple design. A group of subjects (the students) were 
exposed to a treatment (the online course), and the outcome (GMAT scores) was 
observed. Here is the design: 


Subjects —> Online course +» GMAT scores 


A closer look at the GMAT review course showed that the students in the online 
review course were quite different from the students who in past years took the class- 
room course. In particular, they were older and more likely to be employed. An online 
course appeals to these mature people, but we can’t compare their performance with 
that of the undergraduates who previously dominated the course. The online course 
might even be less effective than the classroom version. The effect of online versus 
in-class instruction is confounded with the effect of lurking variables. As a result of 
confounding, the experiment is biased in favor of the online course. 

Would the situation have been different if both the online and the classroom 
courses had been given this year? If students still chose the course they wanted, 
with older students tending to sign up for the online course and younger students 
tending to sign up for the classroom course, then the effect of type of course would 
still be confounded with the lurking variable age. The solution will be described in 
the next section. m 


Most laboratory experiments use a design like that in Example 9.4: 
Subjects —» Treatment — > Measure response 


In the controlled environment of the laboratory, simple designs often work 
A well. Field experiments and experiments with living subjects are exposed to 

more variable conditions and deal with more variable subjects. Outside the 
laboratory, uncontrolled experiments often yield worthless results because of confounding 
with lurking variables. 


Apply Your Knowledge 


9.7 Reducing Unemployment. Will cash bonuses speed the return to work of 
unemployed people? A state department of labor notes that, last year, 41% of 
people who filed claims for unemployment insurance found a new job within 
15 weeks. As an experiment, the state offers $500 to people filing unem- 
ployment claims if they find a job within 15 weeks. The percent who do 
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control group 


so increases to 53%. Suggest some conditions that might make it easier or 
harder to find a job this year as opposed to last year. Confounding with these 
lurking variables makes it impossible to say whether the $500 bonus really 
caused the increase. 


The remedy for the confounding in Example 9.4 is to be sure that we do a com- 
parative experiment in which some students are taught in the classroom and other, 
similar students take the course online. In Example 9.4, the classroom group is 
called a control group. A control group receives either a “standard” treatment or 
in some cases a “sham” treatment, and provides a basis for comparison with the 
other treatment groups. Although in many experiments one of the treatments is 
a control treatment, a comparative experiment does not require this. In Example 
9.4, a comparative experiment could simply compare two newly developed online 
courses without including a control or classroom treatment. Most well-designed 
experiments compare two or more treatments. Part of the design of an experiment 
is a description of the factors (explanatory variables) and the layout of the treat- 
ments, with comparison as the leading principle. 

However, as discussed at the end of Example 9.4, comparison alone isn’t enough to 
produce results we can trust. If the treatments are given to groups that differ mark- 
edly when the experiment begins, bias will result. If we allow students to select online 
or classroom instruction, students who are older and employed are likely to sign up 
for the online course. Personal choice will bias our results in the same way that vol- 
unteers bias the results of online opinion polls. The solution to the problem of bias 
in sampling is random selection, and the same is true in experiments. The subjects 
assigned to any treatment should be chosen at random from the available subjects. 


Randomized Comparative Experiment 


An experiment that uses both comparison of two or more treatments and random 
assignment of subjects to treatments is a randomized comparative experiment. 


FIGURE 9.3 

Outline of a randomized comparative 
experiment to compare online and class- 
room instruction, for Example 9.5. 


The college decides to compare the progress of 25 on-campus students taught in 
the classroom with that of 25 students taught the same material online. Select the 
students who will be taught online by taking a simple random sample of size 25 from 
the 50 available subjects. The remaining 25 students form the control group. They 
will receive classroom instruction. The result is a randomized comparative experiment 
with two groups. Figure 9.3 outlines the design in graphical form. 


Group 1 Treatment 1 
25 students : Online 
Random Compare 
assignment SS ra GMAT scores 
Group 2 Treatment 2 
25 students Classroom 
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The selection procedure is exactly the same as it is for sampling. 


Label: Label the 50 students O1 to 50. 


Table: Go to the table of random digits and read successive two-digit groups. The 
first 25 labels encountered select the online group. As usual, ignore repeated labels 
and groups of digits not used as labels. For example, if you begin at line 125 in Table 
B, the first five students chosen are those labeled 21, 49, 37, 18, and 44. Software 
such as the Simple Random Sample applet makes it particularly easy to choose treat- 
ment groups at random. m 


The design in Example 9.5 is comparative because it compares two treatments 
(the two instructional settings). It is randomized because the subjects are assigned 
to the treatments by chance. This “flowchart” outline in Figure 9.3 presents all the 
essentials: randomization, the sizes of the groups and which treatment they receive, 
and the response variable. There are, as we will see later, statistical reasons for gen- 
erally using treatment groups that are about equal in size. We call designs like that 
in Figure 9.3 completely randomized. 


Completely Randomized Design 


In a completely randomized experimental design, all the subjects are allocated at 
random among all the treatments. 


Completely randomized designs can compare any number of treatments. Here is 
an example that compares three treatments. 


Many utility companies have introduced programs to encourage energy conservation 
among their customers. An electric company considers placing small digital displays 
in households to show current electricity use and what the cost would be if this use 
continued for a month. Will the displays reduce electricity use? Would cheaper meth- 
ods work almost as well? The company decides to conduct an experiment. 

One cheaper approach is to give customers a chart and information about moni- 
toring their electricity use from their outside meter. The experiment compares these 
two approaches (display, chart) and also a control. The control group of customers 
receives information about energy conservation but no help in monitoring electricity 
use. The response variable is total electricity used in a year. The company finds 60 
single-family residences in the same city willing to participate, so it assigns 20 resi- 
dences at random to each of the three treatments. Figure 9.4 outlines the design. 


Group 1 Treatment 1 
20 houses : Display 


/ 


Random Group 2 Treatment 2 Compare 
assignment 20 houses Chart electricity 
use 


Group 3 Treatment 3 
20 houses Control 


FIGURE 9.4 

Outline of a completely randomized 
design comparing three energy-saving 
programs, for Example 9.6. 


236 CHAPTER 9 @ Producing Data: Experiments 


60 and the sample size to 20 and click “Sample.” The 20 households chosen 
receive the displays. The “Population hopper” now contains the 40 remaining 
households, in scrambled order. Click “Sample” again to choose 20 of these to 
receive charts. The 20 households remaining in the “Population hopper” form the 
control group. 

To use Table B, label the 60 households 01 to 60. Enter the table to select an 
SRS of 20 to receive the displays. Continue in Table B, selecting 20 more to receive 
charts. The remaining 20 form the control group. m 


& To use the Simple Random Sample applet, set the population labels as 1 to 


Examples 9.5 and 9.6 describe completely randomized designs that compare 
values of a single factor. In Example 9.5, the factor is the type of instruction. In 
Example 9.6, it is the method used to encourage energy conservation. Completely 
randomized designs can have more than one factor. The advertising experiment of 
Example 9.3 has two factors: the length and the number of repetitions of a television 
commercial. Their combinations form the six treatments outlined in Figure 9.2. A 
completely randomized design assigns subjects at random to these six treatments. 
Once the layout of treatments is set, the randomization needed for a completely 
randomized design is tedious but straightforward. 


() LaunchPad Online Resources 


* The StatBoards video, Outlining an Experiment, provides additional 
examples of outlining the design of an experiment using figures similar to 
those given in this section. 


Apply Your Knowledge 


9.8 Does Playing Video Games Improve Surgical Skills? Another Look. In 
laparoscopic surgery, a video camera and several thin instruments are inserted 
into the patient’s abdominal cavity. The surgeon uses the image from the 
video camera positioned inside the patient’s body to perform the procedure 
by manipulating the instruments that have been inserted. It has been found 
that the Nintendo Wii™, with its motion-sensing interface, reproduces the 
movements required in laparoscopic surgery more closely than other video 
games. If training with a Nintendo Wii™ can improve laparoscopic skills, it 
can complement the more expensive training on a laparoscopic simulator. 
Forty-two medical residents were chosen and all were tested on a set of basic 
laparoscopic skills. Twenty-one were selected at random to undergo system- 
atic Nintendo Wii™ training for one hour a day, five days a week, for four 
weeks. The remaining 21 residents were given no Nintendo Wii™ training 
and asked to refrain from video games during this period. At the end of four 
weeks, all 42 residents were tested again on the same set of laparoscopic 
skills. The difference in time (before — after) to complete a virtual gall blad- 
der removal on the simulator was measured.” 


(a) Compare the study described above with the study in Example 9.1. 
What are the important differences between the two studies? Ignoring 
the fact that different video games and measures of laparoscopic skill 
were used in the two studies, explain in simple language which study 
gives stronger evidence that playing video games is helpful in improving 
laparoscopic skills and why. 


(b) Outline the design of this experiment, following the model of Figure 
9.3. What is the response variable? 
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(c) Carry out the random assignment of 21 residents to the Nintendo 
training group, using the Simple Random Sample applet, other software, 
or Table B, starting at line 130. 


9.9 More Rain for California? The changing climate will probably bring more 
rain to California, but we don’t know whether the additional rain will come 
during the winter wet season or extend into the long dry season in spring and 
summer. Kenwyn Suttle of the University of California at Berkeley and his co- 
workers carried out a randomized controlled experiment to study the effects 
of more rain in either season. They randomly assigned plots of open grassland 
to three treatments: added water equal to 20% of annual rainfall either during 
January to March (winter) or during April to June (spring), and no added water 
(control). Thirty-six circular plots of area 70 square meters were available (see 
the photo), of which 18 were used for this study. One response variable was 
total plant biomass, in grams per square meter, produced in a plot over a year.’ 


(a) Outline the design of the experiment, following the model of Figure 9.4. 
(b) Number all 36 plots and choose six at random for each of the three 
treatments. Be sure to explain how you did the random selection. 


9.10 Effects of TV Advertising. Figure 9.2 displays the six treatments for the 
two-factor experiment on TV advertising described in Example 9.3. The 24 
students named here will serve as subjects. Outline the design and randomly 
assign the subjects to the six treatments, an equal number of subjects to each 
treatment. If you use Table B, start at line 135. 


Abramson Weingold Cohen Cressie Linder Stanley 


Anthony Blake Santner Kessis Minor Tory 
Pearl Brower Delp Lahr Carson Verducci 
Baker Knab Disbro Kruger Shi Walsh 


Randomized comparative experiments are designed to give good evidence that 
differences in the treatments actually cause the differences we see in the response. 
The logic is as follows: 


= Random assignment of subjects forms groups that should be similar in all 
respects before the treatments are applied. Exercise 9.52 uses the Simple 
Random Sample applet to demonstrate this. 


= A comparative experiment with randomization ensures that influences other 
than the experimental treatments operate equally on all groups. 


u Therefore, differences in average response must be due either to the treatments 
or to the play of chance in the random assignment of subjects to the treatments. 


That “either—or” deserves more thought. In Example 9.5, we cannot say that any 
difference between the average GMAT scores of students enrolled online and in the 
classroom must be caused by a difference in the effectiveness of the two types of 
instruction. There would be some difference even if both groups received the same 
instruction, because of variation among students in background and study habits. 
Chance assigns students to one group or the other, and this creates a chance differ- 
ence between the groups. We would not trust an experiment with just one student 
in each group, for example. The results would depend too much on which group got 
lucky and received the stronger student. If we replicate the experiment by assigning 


Suttle, K. B., M. A. Thomsen, and M. E. Power. 
“Species Interactions Reverse Grassland Responses 
to Changing Climate.” Science 315.5812 (2007): 
640-42. 


ay WHAT’S NEWS? 
Randomized comparative 


experiments provide the best 
evidence for medical advances. 

Do newspapers care? Maybe not. 
University researchers looked at 
1192 articles in medical journals, of 
which 7% were turned into stories 
by the two newspapers examined. 
Of the journal articles, 37% 
concerned observational studies, 
and 25% described randomized 
experiments. Among the articles 
publicized by the newspapers, 58% 
were observational studies, and only 
6% were randomized experiments. 
Conclusion: the newspapers want 
exciting stories, especially bad-news 
stories, whether or not the evidence 
is good. 
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many subjects to each group, the effects of chance will average out, and there will 
be little difference in the average responses in the two groups unless the treatments 

replication _ themselves cause a difference. Replication, or the use of enough subjects to reduce 
chance variation, is the third big idea of statistical design of experiments. 


Principles of Experimental Design 


The basic principles of statistical design of experiments are 


1. Control—testrict the effects of lurking variables on the response, most simply 
by comparing two or more treatments. 


2. Randomization—use chance to assign subjects to treatments. 


3. Replication—use enough subjects in each group to reduce chance variation in 
the results. 


We hope to see a difference in the responses between the two groups so large 
that it is unlikely to happen just because of chance variation. We can use the laws 
of probability, which describe chance behavior, to learn if the treatment effects are 
larger than we would expect to see if only chance were operating. If they are, we 
call them statistically significant. 


Statistical Significance 


An observed effect so large that it would rarely occur by chance is called statistically 
significant. 


If we observe statistically significant differences among the groups in a random- 
ized comparative experiment, we have good evidence that the treatments actually 
caused these differences. You will often see the phrase “statistically significant” in 
reports of investigations in many fields of study. The great advantage of randomized 
comparative experiments is that they can produce data that give good evidence for 
a cause-and-effect relationship between the explanatory and response variables. We 
know that in general a strong association does not imply causation. A statistically 
significant association in data from a well-designed experiment does imply causation. 


() LaunchPad Online Resources 


* The Snapshots video, Experimental Design, reviews the important ideas 
of randomized controlled experiments in the context of a study with five 
treatments. 


Apply Your Knowledge 


9.11 Prayer and Meditation. You read in a magazine that “nonphysical treat- 
ments such as meditation and prayer have been shown to be effective in 
controlled scientific studies for such ailments as high blood pressure, insom- 
nia, ulcers, and asthma.” Explain in simple language what the article means 
by “controlled scientific studies.” Why can such studies in principle provide 
good evidence that, for example, meditation is an effective treatment for 
high blood pressure? 


9.12 Conserving Energy. Example 9.6 describes an experiment to learn 
whether providing households with digital displays or charts will reduce 
their electricity consumption. An executive of the electric company objects 


9.6 Cautions About Experimentation 


to including a control group. He says: “It would be simpler to just compare 
electricity use last year (before the display or chart was provided) with con- 
sumption in the same period this year. If households use less electricity this 
year, the display or chart must be working.” Explain clearly why this design 
is inferior to that in Example 9.6. 

9.13 Healthy Diet and Cataracts. The relationship between healthy diet and 
prevalence of cataracts was assessed using a sample of 1808 participants from 
the Women’s Health Initiative Observational Study. Having a high Healthy 
Eating Index score was the strongest predictor of a reduced risk of cataracts, 
among modifiable behaviors considered. The Healthy Eating Index score 
was created by the U.S. Department of Agriculture and measures how well a 
person’s diet conforms to recommended healthy eating patterns. The report 
concludes: “These data add to the body of evidence suggesting that eating 
foods rich in a variety of vitamins and minerals may contribute to postponing 
the occurrence of the most common type of cataract in the United States.”!! 


(a) Explain why this is an observational study rather than an experiment. 

(b) Although the result was statistically significant, the authors did not 
use strong language in stating their conclusions, instead using words 
such as “suggesting” and “may.” Do you think that their language is 
appropriate given the nature of the study? Why? 


The logic of a randomized comparative experiment depends on our ability to 
treat all the subjects identically in every way except for the actual treatments being 
compared. Good experiments therefore require careful attention to details to ensure 
that all subjects really are treated identically. 

If some subjects in a medical experiment take a pill each day and a control group 
takes no pill, the subjects are not treated identically. Many medical experiments are 
therefore “placebo controlled.” A study of the effects of taking vitamin E on heart 
disease is a good example. All the subjects receive the same medical attention during 
the several years of the experiment. All take a pill every day: vitamin E in the treat- 
ment group, and a placebo in the control group. A placebo is a dummy treatment 
that is as similar to the treatment as possible but contains no “active ingredient.” In 
this experiment, the placebo would be a pill that looked like the vitamin E pill, yet 
contained no active ingredient. As a second example, a study compared arthroscopic 
surgery versus no surgery on several recovery outcomes for a partial meniscus tear.!” 
Patients randomly assigned to the “no surgery” group received a sham surgery in 
which the surgeon asked for all instruments, manipulated the knee as in surgery and 
kept the sham surgery sufficiently realistic so that afterwards patients were unaware 
of whether they received the actual surgery or the sham surgery. 

Many patients respond favorably to any treatment, even a placebo, perhaps 
because they trust the doctor or they believe that the treatment will work. The 
favorable response to a placebo treatment or a treatment with no therapeutic value 
is called the placebo effect. If the control group did not take any pills, the effect of 
vitamin E in the treatment group would be confounded with the placebo effect, the 
effect of simply taking pills. That is, if the vitamin E group improved, we wouldn’t 
know whether it was due to simply taking a pill, or the actual vitamin E content 
of the pill. Having a placebo group for comparison allows us to see if the effect of 


placebo 


ay SCRATCH MY 
FURRY EARS 

Rats and rabbits, specially bred 

to be uniform in their inherited 
characteristics, are the subjects in 
many experiments. Animals, like 
people, are quite sensitive to how 
they are treated. This can create 
opportunities for hidden bias. For 
example, human affection can change 
the cholesterol level of rabbits. 
Choose some rabbits at random and 
regularly remove them from their 
cages to have their heads scratched 
by friendly people. Leave other rabbits 
unloved. All the rabbits eat the same 
diet, but the rabbits that receive 
affection have lower cholesterol. 
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taking a vitamin E pill in the treatment group is larger than the effect of simply 
taking a pill in the placebo group. 

In addition, such studies are usually double-blind. The subjects don’t know 
whether they are taking vitamin E or a placebo. Neither do the medical person- 
nel who work with them. The double-blind method avoids unconscious bias by, for 
example, a doctor who is convinced that a vitamin must be better than a placebo. 
In many medical studies, only the statistician who does the randomization knows 
which treatment each patient is receiving. 


Double-Blind Experiments 


In a double-blind experiment, neither the subjects nor the people who interact with 
them know which treatment each subject is receiving. 


When testing a drug against a placebo, we indicated that the placebo contains 
no “active ingredient,” but the situation can be more complex than this would 
suggest. Since many drugs being tested have known side effects, such as dry 
mouth, an “active placebo,” may be used. An active placebo contains ingredients 
designed to mimic the side effects of the drug, but does not treat the particular 
disease.? This is important in blinding as it can prevent the subject from being 
aware of whether they received the drug or the placebo. Drug companies create 
these placebo drugs and many do not provide information regarding the contents 
of an active placebo. Unfortunately, this then allows drug companies to make 
advertising claims such as that the occurrence of dry mouth was similar in the 
drug and the placebo group. 

Placebo controls and the double-blind method are more ways to eliminate pos- 
sible confounding. But even well-designed experiments often face another problem: 

lack of realism lack of realism. Practical constraints may mean that the subjects or treatments or 
setting of an experiment don’t realistically duplicate the conditions we really want 
to study. Here are two examples. 


The study of television advertising in Example 9.3 showed a 40-minute video to 
students who knew an experiment was going on. We can’t be sure that the results 
apply to everyday television viewers. Many behavioral science experiments use as 
subjects students or other volunteers who know they are subjects in an experiment. 
That’s not a realistic setting. m 


Do those high center brake lights, required on all cars sold in the United States since 
1986, really reduce rear-end collisions? Randomized comparative experiments with 
fleets of rental and business cars, done before the lights were required, showed that 
the third brake light reduced rear-end collisions by as much as 50%. Alas, requiring 
the third light in all cars led to only a 5% drop. 

What happened? Most cars did not have the extra brake light when the experi- 
ments were carried out, so it caught the eye of following drivers. Now that almost all 
cars have the third light, they no longer capture attention. m 


9.7 Matched Pairs and Other Block Designs 241 


experiment to the settings of greatest interest. Most experimenters want to 

generalize their conclusions to some setting wider than that of the actual 
experiment. Statistical analysis of an experiment cannot tell us how far the results will 
generalize. Nonetheless, the randomized comparative experiment, because of its 
ability to give convincing evidence for causation, is one of the most important ideas 
in statistics. 


A Lack of realism can limit our ability to apply the conclusions of an 


Apply Your Knowledge 


9.14 Testosterone for Older Men. As men age, their testosterone levels gradu- 
ally decrease. This may cause a reduction in lean body mass, an increase 
in fat, and other undesirable changes. Do testosterone supplements reverse 
some of these effects? A study in the Netherlands assigned 237 men aged 
60 to 80 with low or low-normal testosterone levels to either a testosterone 
supplement or a placebo. The report in the Journal of the American Medi- 
cal Association described the study as a “double-blind, randomized, placebo- 
controlled trial.”'* Explain each of these terms to someone who knows no 
statistics. 


9.15 Does Meditation Reduce Anxiety? An experiment that claimed to show 
that meditation reduces anxiety proceeded as follows. The experimenter in- 
terviewed the subjects and rated their level of anxiety. Then the subjects 
were randomly assigned to two groups. The experimenter taught one group 
how to meditate, and they meditated daily fora month. The other group was 
simply told to relax more. At the end of the month, the experimenter inter- 
viewed all the subjects again and rated their anxiety level. The meditation 
group now had less anxiety. Psychologists said that the results were suspect 
because the ratings were not blind. Explain what this means and how lack of 
blindness could bias the reported results. 


Completely randomized designs are the simplest statistical designs for experi- 
ments. They illustrate clearly the principles of control, randomization, and adequate 
number of subjects. However, completely randomized designs are often inferior to 
more elaborate statistical designs. In particular, matching the subjects in various 
ways can produce more precise results than simple randomization. 

One common design that combines matching with randomization is the matched matched pairs design 
pairs design. A matched pairs design compares just two treatments. Choose pairs 
of subjects that are as closely matched as possible. Use chance to decide which 
subject in a pair gets the first treatment. The other subject in that pair gets the 
other treatment. That is, the random assignment of subjects to treatments is done 
within each matched pair, not for all subjects at once. Sometimes each “pair” in 
a matched pairs design consists of just one subject, who gets both treatments one 
after the other. Each subject serves as his or her own control. The order of the 
treatments can influence the subject’s response, so we randomize the order for 
each subject. 
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Does talking on a hands-free cell phone distract drivers? Undergraduate students 
“drove” in a high-fidelity driving simulator equipped with a hands-free cell phone. The 
car ahead brakes: how quickly does the subject react? Let’s compare two designs for 
this experiment. There are 40 student subjects available. 

In design 1, the completely randomized design, all 40 subjects are assigned at 
random: 20 simply to drive, and the other 20 to talk on the cell phone while driving. 
In design 2, the matched pairs design that was actually used, all subjects drive both 
with and without using the cell phone. The two drives are on separate days to reduce 
carryover effects. The order of the two treatments is assigned at random: 20 subjects 
are chosen to drive first with the phone, and the remaining 20 drive first without the 
phone. '° 

Some subjects naturally react faster than others. The completely randomized 
design relies on chance to distribute the faster subjects roughly evenly between the 
two groups. The matched pairs design compares each subject’s reaction time with and 
without the cell phone. This makes it easier to see the effects of using the phone. m 


© Tom Grill/Corbis 


Matched pairs designs use the principles of comparison of treatments and ran- 
domization. However, the randomization is not complete—we do not randomly 
assign all the subjects at once to the two treatments. Instead, we randomize only 
within each matched pair. This allows matching to reduce the effect of variation 
among the subjects. Matched pairs are one kind of block design, with each pair 
forming a block. 


Block Design 


A block is a group of individuals that are known before the experiment to be similar 
in some way that is expected to affect the response to the treatments. 

Ina block design, the random assignment of individuals to treatments is carried out 
separately within each block. 


A block design combines the idea of creating equivalent treatment groups by 
matching with the principle of forming treatment groups at random. Blocks are 
another form of control. They control the effects of some outside variables by bring- 
ing those variables into the experiment to form the blocks. Here are some typical 
examples of block designs. 


Women and men respond differently to advertising. An experiment to compare the 
effectiveness of three advertisements for the same product will want to look sepa- 
rately at the reactions of men and women, as well as to assess the overall response 
to the ads. 

A completely randomized design considers all subjects, both men and women, as 
a single pool. The randomization assigns subjects to three treatment groups without 
regard to their sex. This ignores the differences between men and women. A block 
design considers women and men separately. Randomly assign the women to three 
groups, with one to view each advertisement. Then separately assign the men at 
random to three groups. Figure 9.5 outlines this improved design. = 


9.7 Matched Pairs and Other Block Designs 243 


Assignment to blocks G 1 
is not random. oe roup 1 ——>* Ad1 — 


Random Group 2 Ado Compare 


assignment \ ye reaction 
Group 3 ——> Ad3 


Women —> 


Subjects 


FIGURE 9.5 
Outline of a block design, for 
Example 9.10. The blocks consist 


Group1 ——>+ Ad1 


Met Random Compare 


—-> Group2 ——» Ad2 —> 


of male and female subjects. The assignment reaction 
treatments are three advertisements % ve 
Group 3 ——> Ad3 


for the same product. 


A social policy experiment will assess the effect on family income of several proposed 
new welfare systems and compare them with the present welfare system. Because 
the future income of a family is strongly related to its present income, the families 
who agree to participate are divided into blocks of similar income levels. The families 
in each block are then allocated at random among the welfare systems. m 


A block design allows us to draw separate conclusions about each block, for exam- 
ple, about men and women in Example 9.10. Blocking also allows more precise over- 
all conclusions because the systematic differences between men and women can be 
removed when we study the overall effects of the three advertisements. The idea of 
blocking is an important additional principle of statistical design of experiments. A 
wise experimenter will form blocks based on the most important unavoidable sources 
of variability among the subjects. Randomization will then average out the effects of 
the remaining variation and allow an unbiased comparison of the treatments. 

Like the design of samples, the design of complex experiments is a job for experts. 
Now that we have seen a bit of what is involved, we will concentrate for the most 
part on completely randomized experiments. 


Apply Your Knowledge 


9.16 Comparing Breathing Frequencies in Swimming. Researchers from 
the United Kingdom studied the effect of two breathing frequencies on both 
performance times and several physiological parameters in front crawl swim- 
ming.’ The breathing frequencies were one breath every second stroke (B2) 
and one breath every fourth stroke (B4). Subjects were 10 male collegiate 
swimmers. Each subject swam 200 meters, once with breathing frequency B2 
and once on a different day with breathing frequency B4. 


(a) Describe the design of this matched pairs experiment, including the 
randomization required by this design. 


(b) Could this experiment be conducted using a completely randomized 
design? How would the design differ from the matched pairs experiment? 


(c) Suppose we allow each swimmer to choose their own breathing fre- 
quency and then swim 200 meters using their selected frequency. Are 
there any problems with then comparing the performance of the two 
breathing frequencies? 
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Alysha Fligner 


Grandma’s Kitchen 


9.17 Font Naturalness and Perceived Healthiness. Can the font used in a 
packaged product affect our perception of the product’s healthiness? It was 
hypothesized that use of a natural font, which looks more handwritten and 
tends to be more slanted and curved, would lead to a higher perception of 
product healthiness than an unnatural font. Two fonts, Impact and Sketch- 
flow Print, were used. These fonts were shown in a previous study to differ 
in their perceived naturalness, but otherwise were rated similarly on factors 
such as readability and likeability. Images of two identical packages differing 
only in the font used were presented to the subjects. Participants read state- 
ments such as, “This product is healthy”/natural/wholesome/organic, and 
they rated how much they agreed with the statement on a seven-point scale 
with “1” indicating strong agreement and “7” indicating strong disagreement. 
Each subject’s responses were combined to create a perceived healthiness 
score.!’ The researcher had 100 students available to serve as subjects. 


(a) What are the response and the explanatory variables? 


(b) Describe the design of a completely randomized experiment to learn 
the effect of font naturalness on perceived healthiness. 


(c) Describe the design of a matched pairs experiment using the same 
100 subjects. 


9.18 Technology for Teaching Statistics. The Brigham Young University 
statistics department is performing randomized comparative experiments to 
compare teaching methods. Response variables include students’ final-exam 
scores and a measure of their attitude toward statistics. One study compares 
two levels of technology for large lectures: standard (overhead projectors and 
chalk) and multimedia. The individuals in the study are the eight lectures 
in a basic statistics course. There are four instructors, each of whom teaches 
two lectures. Because the lecturers differ, their lectures form four blocks.!® 
Suppose the lectures and lecturers are as follows: 


Lecture Lecturer Lecture Lecturer 
1 Hilton 5 Tolley 
2, Christensen 6 Hilton 
3 Hadfield 7 Tolley 
4 Hadfield 8 Christensen 


Outline a block design and do the randomization that your design requires. 


CHAPTER 9 SUMMARY 


Chapter Specifics 


m Wecan produce data intended to answer specific questions by observational studies or 


experiments. Sample surveys that select a part of a population of interest to represent 
the whole are one type of observational study. Experiments, unlike observational stud- 
ies, actively impose some treatment on the subjects of the experiment. 


Variables are confounded when their effects on a response can’t be distinguished from 
each other. Observational studies and uncontrolled experiments often fail to show 
that changes in an explanatory variable actually cause changes in a response variable 
because the explanatory variable is confounded with lurking variables. 

In an experiment, we impose one or more treatments on individuals, often called sub- 
jects. Each treatment is a combination of values of the explanatory variables, which we 
call factors. 


The design of an experiment describes the choice of treatments and the manner in 
which the subjects are assigned to the treatments. The basic principles of statistical 


design of experiments are control and randomization to combat bias and replication 
(using enough subjects) to reduce chance variation. 


m The simplest form of control is comparison. Experiments should compare two or more 
treatments to avoid confounding of the effect of a treatment with other influences, 
such as lurking variables. 


m Randomization uses chance to assign subjects to the treatments. Randomization cre- 
ates treatment groups that are similar (except for chance variation) before the treat- 
ments are applied. Randomization and comparison together prevent bias, or systematic 
favoritism, in experiments. 


m You can carry out randomization by using software or by giving numerical labels to the 
subjects and using a table of random digits to choose treatment groups. 


m Applying each treatment to many subjects reduces the role of chance variation and 
makes the experiment more sensitive to differences among the treatments. 


= Good experiments also require attention to detail as well as good statistical design. Many 
behavioral and medical experiments are double-blind. Some give a placebo to a control 
group. Lack of realism in an experiment can prevent us from generalizing its results. 


m In addition to comparison, a second form of control is to restrict randomization by 
forming blocks of individuals that are similar in some way that is important to the 
response. Randomization is then carried out separately within each block. 


m Matched pairs are a common form of blocking for comparing just two treatments. In 
some matched pairs designs, each subject receives both treatments in a random order. 
In others, the subjects are matched in pairs as closely as possible, and each subject in a 
pair receives one of the treatments. 


Link It 


Observational studies and experiments are two methods for producing data. Observational stud- 
ies are useful when the conclusion involves describing a group or situation without disturbing 
the scene we observe. Sample surveys, discussed in Chapter 8, are an important type of obser- 
vational study in which we draw conclusions about a population by observing only a part of the 
population (the sample). In contrast, experiments are used when the situation calls for a conclu- 
sion about whether a treatment causes a change in a response. The distinction between observa- 
tional studies and experiments will be important when stating your conclusions in later chapters. 

Only well-designed experiments provide a sound basis for concluding cause-and-effect 
relationships. In a simple comparative experiment, two treatments are imposed on two groups 
of individuals. Reaching the conclusion that the difference between the groups is caused by 
the treatments, rather than lurking variables, requires that the two groups of individuals be 
similar at the outset. A randomized comparative experiment is used to create groups that are 
similar. If there is a sufficiently large difference between the groups after imposing the treat- 
ments, we can say that the results are statistically significant and conclude that the differences 
in the response were caused by the treatments. In later chapters, the specific statistical proce- 
dures for reaching these conclusions will be described. 

As with sampling, when conducting an experiment, attention to detail is important be- 
cause our conclusions can be weakened by several factors. A lack of blinding can result in the 
expectations of the researcher influencing the results, while the placebo effect can confound 
the comparison between a treatment and a control group. In many instances, a more complex 
design is required to overcome difficulties and can produce more precise results. 


() LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


° StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 
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CHECK YOUR SKILLS 


9.19 A large representative random sample of 6906 U.S. 


9.20 


9.21 


9.22 


9.23 


adults collected over 20 years showed that “parents 
reported higher levels of life satisfaction than nonpar- 
ents,” with the observed difference in life satisfaction 
between the two groups being statistically significant.” 
This is an example of 


(a) an observational study. 
(b) a randomized comparative experiment. 
(c) a matched pairs experiment. 


In the study described in Exercise 9.19, we can conclude 
that 


(a) having children leads to higher levels of life 
satisfaction. We can reach this conclusion because 
we have a representative sample. 

(b) having children leads to higher levels of life 
satisfaction. We can reach this conclusion because 
we have both a large and a representative sample. 
(c) parents tend to have higher satisfaction in their 
lives than nonparents. 


What electrical changes occur in muscles as they get 
tired? Student subjects hold their arms above their 
shoulders until they have to drop them. Meanwhile, 
the electrical activity in their arm muscles is measured. 
This is 

(a) an observational study. 

(b) an uncontrolled experiment. 

(c) a randomized comparative experiment. 


Do violence and sex in television programs help sell 
products in advertisements? Subjects were randomly 
assigned to watch one of four types of TV shows: 
(1) neither sex nor violence in the content code, 
(2) violence but no sex in the content code, (3) sex 
but no violence in the content code, and (4) both 
sex and violence in the content code. For each TV 
show, the original advertisements were replaced with 
the same set of 12 advertisements. Subjects were not 
told the purpose of the study, but were instead told 
that the researchers were studying attitudes toward 
TV shows. After viewing the show, subjects received a 
surprise memory test to check their recall of the prod- 
ucts advertised.”° This experiment has 


(a) four factors, the four TV shows being compared. 
(b) 12 factors, the advertisements being shown. 

(c) two factors, with/without violent content and 
with/without sexual content. 


In the experiment of Exercise 9.22, the 336 subjects 
are labeled 001 to 336. Labels are selected at random 
by software, with the first 84 selected assigned to view 
TV show 1, the next 84 to view TV show 2, and the 
next 84 to view TV show 3. The 84 remaining subjects 
view TV show 4. This is a 


9.24 


9.25 


9.26 


(a) matched pairs design because subjects are matched 
to the TV shows. 

(b) completely randomized design. 

(c) block design with TV shows representing the four 
blocks. 


In the experiment described in Exercise 9.22, 


(a) it would have been better to have subjects 
choose the type of TV show they preferred to view 
to improve their recall and reduce confounding. 
(b) the score on the memory test of their recall of 
advertisements is the response. 

(c) the experimenters should have used different 
advertisements for each type of TV show to reduce 
confounding. 


Does exposure to aircraft noise increase the risk of 
hospitalization for cardiovascular disease in older 
people (=65 years) residing near airports? Selecting 
a random sample of approximately 650,000 Medicare 
claims, it was found that about 75,000 people had zip 
codes near airports and the remaining 575,000 did 
not. The proportions of hospital admissions related to 
cardiovascular diseases were computed for those with 
zip codes near airports and those who did not have zip 
codes near airports. A larger proportion of admissions 
for cardiovascular disease was found for older people 
living in zip codes near airports. Which of the follow- 
ing statements is correct? 


(a) Since this is an observational study, living in a 

zip code near an airport may or may not be causing 
the increase in the proportions of admissions for 
cardiovascular disease. 

(b) Because of the large sample sizes from each group, 
we can claim that living in a zip code near an airport 
is causing the increase in the proportion of admissions 
for cardiovascular disease. 

(c) Since this is an experiment, but not a randomized 
experiment, we can still conclude that living in a 
zip code near an airport is causing the increase in 
the proportions of admissions for cardiovascular 
disease. 


The Community Intervention Trial for Smoking Ces- 
sation asked whether a community-wide advertising 
campaign would reduce smoking. The researchers 
located 11 pairs of communities, with each pair simi- 
lar in location, size, economic status, and so on. One 
community in each pair was chosen at random to par- 
ticipate in the advertising campaign and the other was 
not. This is 


(a) an observational study. 
(b) a matched pairs experiment. 
(c) a completely randomized experiment. 


9.27 


9.28 


To decide which community in each pair in Exer- 
cise 9.26 should get the advertising campaign, it is 
best to 


(a) toss a coin. 

(b) choose the community that will help pay for the 
campaign. 

(c) choose the community with a mayor who will 
participate. 


A marketing class designs two videos advertising an 
expensive Mercedes sports car. They test the videos 
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by asking fellow students to view both (in random or- 
der) and say which makes them more likely to buy the 
car. Mercedes should be reluctant to agree that the 
video favored in this study will sell more cars because 


(a) the study used a matched pairs design instead 

of a completely randomized design. 

(b) results from students may not generalize to 

the older and richer customers who might buy a 
Mercedes. 

(c) this is an observational study, not an experiment. 


CHAPTER 9 EXERCISES 


In all exercises that require randomization, you may use software, 
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the Simple Random Sample applet, or Table B. See 
Example 9.6 (page 235) for directions on using the applet 
for more than two treatment groups. 


Red meat and mortality. Many studies have found 
an association between red meat consumption and an 
increased risk of chronic diseases. What is the rela- 
tionship between red meat consumption and mortal- 
ity? A large study followed 120,000 men and women 
who were free of coronary heart disease and cancer 
at the beginning of the study. Participants were asked 
detailed questions about their eating habits every four 
years, and the study spanned almost 30 years. It was 
found that the risk of dying at an early age—from 
heart disease, cancer, or any other cause, rises with the 
amount of red meat that they consumed.”! 


(a) Is this an observational study or an experiment? 
What are the explanatory and response variables? 

(b) The authors noted that, “Men and women 

with higher intake of red meat were less likely to be 
physically active and were more likely to be current 
smokers, to drink alcohol, and to have a higher body 
mass index.” Explain carefully why differences in 
these variables make it more difficult to conclude that 
higher intake of red meat explains the increased death 
rate. What are the variables physical activity, smoking 
status, drinking behavior, and body mass index called? 
(c) Suggest at least one lurking variable related to 
diet that may be confounded with higher intake of 
red meat. Explain why you chose these variables. 


Neighborhood’s effect on grades. To study the effect 
of neighborhood on academic performance, 1000 fam- 
ilies were given federal housing vouchers to move out 
of their low-income neighborhoods. No improvement 
in the academic performance of the children in the 
families was found one year after the move. Explain 
clearly why the lack of improvement in academic per- 
formance after one year does not necessarily mean that 
neighborhood does not affect academic performance. 
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In particular, identify some lurking variables whose 
effect on academic performance may be confounded 
with the effect of neighborhood. Use a figure like 
Figure 9.1 (page 229) to illustrate your explanation. 


Reducing nonresponse. How can we reduce the rate 
of refusals in telephone surveys? Most people who 
answer at all listen to the interviewer's introductory 
remarks and then decide whether to continue. One 
study made telephone calls to randomly selected 
households to ask opinions about the next election. 
In some calls, the interviewer gave her name, in oth- 
ers she identified the university she was representing, 
and in still others she identified both herself and the 
university. The study recorded what percent of each 
group of interviews was completed. Is this an obser- 
vational study or an experiment? Why? What are the 
explanatory and response variables? 


Running and sleep. Sufficient sleep is important for 
adolescents for both their neural and psychological 
development. Despite this, daytime sleepiness and 
poor physical and psychological functioning related 
to chronic sleep disturbances are common. There is 
a growing body of evidence that exercise is associated 
with both better sleep and improved psychological 
functioning. Sixty participants were recruited from a 
high school in northwestern Switzerland. They were 
randomly assigned to either a running group or a con- 
trol group, 30 to each group. The running group ran 
every morning for a little over 30 minutes on weekdays 
for a three-week period. All participants used a sleep 
log for subjective evaluation of sleep, and sleep was 
also objectively assessed at the beginning and end of 
the study using a sleep electroencephalographic device 
that measured quantities such as sleep efficiency and 
time spent in the four different sleep phases. Running 
was found to impact positively on both objective and 
subjective measures of sleep functioning.” 


(a) What are the explanatory variable(s) and the 
response variable(s)? 
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(b) Outline the design of the experiment. 

(c) Here are some more details on the treatment and 
control groups. All participants arrived at school 

at 7 A.M., and the running group did two laps on 

the track and then ran cross country in groups of 
least four people for 30 minutes. The control group 
remained seated at the track, worked on homework, 
and interacted with each other. When the runners 
returned, all participants prepared for school and ate 
a breakfast that was provided. Why do you think the 
experimenters had the control group arrive at 7 A.M., 
interact with classmates, and have breakfast together? 
Explain. Do you think having the control group do 
these activities is important for the types of conclu- 
sions that can be reached? How? 

(d) Time to sleep onset was measured before the 
beginning of the study and again at the end of the 
study for participants in both groups. Can this be 
considered a randomized controlled experiment with 
time to sleep onset as the response and four treatments 
(runners before, runners after, controls before, and 
controls after)? Explain why or why not. 
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Observation versus experiment. An LA Times 
article reported that the artery walls of people living 
within 100 meters of a highway thicken more than 
twice as fast as the average person’s.”? Researchers 
used ultrasound to measure the carotid artery wall 
thickness of 1483 people living near freeways in the 
Los Angeles area. The artery wall thickness among 
those living within 100 meters of a highway increased 
by 5.5 micrometers (roughly 1/20th the thickness of 
a human hair) each year during the three-year study, 
which is more than twice the progression observed in 
participants who did not live within this distance of a 
highway. 
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(a) The study compared artery thickening of subjects 
in the study who were living within 100 meters of a 
highway with those who were not. Without reading 
any further details of this study, how do you know 
that this was an observational study? 

(b) Suggest some variables that might differ between 
the subjects in the study living within 100 meters of a 
highway and those who were further away. Are any of 
these possible confounding variables? Explain. (Think 
about whether living very close to a highway is a 
desirable neighborhood.) 

(c) Could this study be conducted as a random- 
ized comparative experiment? What would be the 
difficulties? 


Attitudes toward homeless people. Negative atti- 
tudes toward poor people are common. Are attitudes 
more negative when a person is homeless? To find out, 
a description of a poor person is read to subjects. There 
are two versions of this description. One begins 


Jim is a 30-year-old single man. He is currently 
living in a small single-room apartment. 


The other description begins 


Jim is a 30-year-old single man. He is currently 
homeless and lives in a shelter for homeless people. 


Otherwise, the descriptions are the same. After read- 
ing the description, you ask subjects what they believe 
about Jim and what they think should be done to help 
him. The subjects are 544 adults interviewed by tele- 
phone.”* Outline the design of this experiment. 


Getting teachers to come to school. Elementary 
schools in rural India are usually small, with a single 
teacher. The teachers often fail to show up for work. 
Here is an idea for improving attendance: give the 
teacher a digital camera with a tamper-proof time and 
date stamp and ask a student to take a photo of the 
teacher and class at the beginning and end of the day. 
Offer the teacher better pay for good attendance— 
verified by the photos. Will this work? A randomized 
comparative experiment started with 120 rural schools 
in Rajasthan and assigned 60 to this treatment and 60 
to a control group. Random checks for teacher atten- 
dance showed that 21% of teachers in the treatment 
group were absent, as opposed to 42% in the control 
group.” 

(a) Outline the design of this experiment. 

(b) Label the schools, and choose the first 10 schools 
for the treatment group. If you use Table B, start at 
line 108. 


Ibuprofen and atherosclerosis. The theory of ath- 
erosclerosis (hardening and narrowing of the arteries) 
emphasizes the role of inflammation in the vascular 
walls. Since ibuprofen is known to possess a wide range 
of anti-inflammatory actions, it was hypothesized that 
it might help in the prevention of atherosclerotic 
lesion development. Both a low cholesterol and a high 
cholesterol diet were used, as the extent of atheroscle- 
rosis is also affected by diet. Thirty-two New Zealand 
rabbits served as the subjects in the experiment and, 
after three months, the percentage of the surface cov- 
ered by atherosclerotic plaques in a region of the aorta 
was evaluated. Although ibuprofen did suppress the 
expression of the MCP-1 gene thought to be related to 
atherosclerosis, it was not shown to have an effect on 
the extent of fat-induced atherosclerotic lesions.”® 


(a) Use a diagram like Figure 9.2 (page 231) to 
display the treatments in a design with two factors: 
“ibuprofen, yes or no” and “high cholesterol diet, 

yes or no.” Then outline the design of a completely 
randomized experiment to compare these treatments. 
(b) Explain how you would number the rabbits and then 
randomly assign the rabbits to the treatments. If you 

use the Simple Random Sample applet or other software, 
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assign all the rabbits. If you use Table B, start at line 128 
and assign rabbits to only the first treatment group. 


Marijuana and work. How does smoking mari- 
juana affect willingness to work? Canadian research- 
ers persuaded young adult men who used marijuana 
to live for 98 days in a “planned environment.” The 
men earned money by weaving belts. They used their 
earnings to pay for meals and other consumption and 
could keep any money left over. One group smoked 
two potent marijuana cigarettes every evening. The 
other group smoked two weak marijuana cigarettes. 
All subjects could buy more cigarettes but were given 
strong or weak cigarettes depending on their group. 
Did the weak and strong groups differ in work output 
and earnings?”’ 


(a) Outline the design of this experiment. 

(b) Here are the names of the 30 subjects. Use 
software or Table B at line 120 to carry out the 
randomization your design requires. 


Abel DeVore Kennedy Reichert Stout 
Aeffner Fleming Lamone Riddle Williams 
Birkel Fritz Mani Sawant Wilson 
Bower Géiriunas Mattos Scannell Worbis 
Burke = Glosup Molnar Sheldon = Zaccai 
Deis Heaton Newlen Simmons  Zelaski 


(c) Do you think this can be run as a double-blind 
experiment? Explain. 


c u L8r 2day: Textese and spelling ability. Textese 
is a sound-based form of spelling to reduce the time 
of text messaging and the number of characters, using 
textisms such as “2day” for “today.” Educators and 
parents have long been concerned about the effect of 
textese on spelling ability. A study of Australian chil- 
dren aged 10-12 considered the effect of text entry 
method on the spelling subtest of the Wide Range 
Achievement Test (WRAT). Multipress is the origi- 
nal entry method in which three or four letters are 
assigned a key and the key must be pressed one to 
four times to produce each letter. Predictive is a single 
key-press per letter with suggested completion of the 
word. A total of 84 children took the WRAT and were 
classified according to text entry method usually used: 
Multipress (27 students), Predictive (45 students), or 
non-texter (12 students) .” 


(a) What are the explanatory and response variables? 
(b) Is this an observational study or an experiment? 
Explain why. 

(c) The differences among the three texting methods 
were not statistically significant. What does no 
significant difference mean in describing the outcome 
of this study? 


Can low-fat food labels lead to obesity? What are 
the effects of low-fat food labels on food consumption? 
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Do people eat more of a snack food when the food is 
labeled as low-fat? The answer may depend both on 
whether the snack food is labeled low-fat and whether 
the label includes serving-size information. An experi- 
ment investigated this question using university staff, 
graduate students, and undergraduate students at a 
large university as subjects. Subjects were asked to 
evaluate a pilot episode for an upcoming TV show in 
a theater on campus and were given a cold 24-ounce 
bottle of water and a bag of granola from a respected 
campus restaurant called The Spice Box. They were 
told to enjoy as much or as little of the granola as 
they wanted. Depending on the condition randomly 
assigned to the subjects, the granola was labeled as 
either “Regular Rocky Mountain Granola” or “Low- 
Fat Rocky Mountain Granola.” Below this, the label 
indicated “Contains 1 Serving,” or “Contains 2 Serv- 
ings,” or it provided no serving-size information.” 
Twenty subjects were assigned to each treatment, and 
their granola bags were weighed at the end of the ses- 
sion to determine how much granola was eaten. 


(a) What are the factors and the treatments? How 
many subjects does the experiment require? 

(b) Outline a completely randomized design for 
this experiment. (You need not actually do the 
randomization.) 


Treating sinus infections. Sinus infections ave common, and doc- 
tors commonly treat them with antibiotics. Another treatment is to 
spray a steroid solution into the nose. A well-designed clinical trial 
found that these treatments, alone or in combination, do not reduce 
the severity or the length of sinus infections.*° Exercises 9.40 to 9.42 
concern this trial. 


9.40 Experimental design. The clinical trial was a com- 
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pletely randomized experiment that assigned 240 
patients at random among four treatments as follows: 


ANTIBIOTIC PLACEBO 
PILL PILL 
Steroid spray 53) 64 
Placebo spray 60 63 


(a) Outline the design of the experiment. 

(b) How will you label the 240 subjects? 

(c) Explain briefly how you would do the random 
assignment of patients to treatments. Assign the first 
five patients who will receive the first treatment. 


Describing the design. The report of this study in the 
Journal of the American Medical Association describes it 
as a “double-blind, randomized, placebo-controlled fac- 
torial trial.” “Factorial” means that the treatments are 
formed from more than one factor. What are the fac- 
tors? What do “double-blind” and “placebo-controlled” 
mean? 
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Checking the randomization. If the random assign- 
ment of patients to treatments did a good job of elimi- 
nating bias, possible lurking variables such as smoking 
history, asthma, and hay fever should be similar in 
all four groups. After recording and comparing many 
such variables, the investigators said that “all showed 
no significant difference between groups.” Explain to 
someone who knows no statistics what “no significant 
difference” means. Does it mean that the presence 
of all these variables was exactly the same in all four 
treatment groups? 


Liquid water enhancers? Bottled water, flavored and 
plain, is expected to become the largest segment of the 
liquid refreshment market by the end of this decade, 
surpassing traditional carbonated soft drinks.*! Kraft’s 
MiO, a liquid water enhancer, comes in a variety of 
flavors and a few drops added to water gives a zero 
calorie flavored water drink. You wonder if those who 
drink flavored water like the taste of MiO as well as 
they like the taste of a competing flavored water prod- 
uct that comes ready to drink. 


(a) Describe a matched pairs design to answer this 
question. Be sure to include proper blinding of your 
subjects. What is your response variable going to be? 
(b) You have 20 people on hand who prefer to drink 
flavored water. Use the Simple Random Sample applet, 
software, or Table B at line 138 to do the randomiza- 
tion that your design requires. 


Growing trees faster. The concentration of carbon 
dioxide (CO,) in the atmosphere is increasing rapidly 
due to our use of fossil fuels. Because green plants use 
CO, to fuel photosynthesis, more CO; may cause trees 
to grow faster. An elaborate apparatus allows research- 
ers to pipe extra CO, to a 30-meter circle of forest. We 
want to compare the growth in base area of trees in 
treated and untreated areas to see if extra CO, does 
in fact increase growth. We can afford to treat three 
circular areas.” 


(a) Describe the design of a completely randomized 
experiment using six well-separated 30-meter circular 
areas in a pine forest. Sketch the circles, and carry 
out the randomization your design calls for. 

(b) Areas within the forest may differ in soil fertility. 
Describe a matched pairs design using three pairs 

of circles that will reduce the extra variation due to 
different fertility. Sketch the circles and carry out the 
randomization your design calls for. 


Athletes taking oxygen. We often see players on the 
sidelines of a football game inhaling oxygen. Their 
coaches think this will speed their recovery. We might 
measure recovery from intense exertion as follows: 
Have a football player run 100 yards three times in 
quick succession. Then allow three minutes to rest 
before running 100 yards again. Time the final run. 
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Because players vary greatly in speed, you plan a 
matched pairs experiment using 25 football players 
as subjects. Discuss the design of such an experiment 
to investigate the effect of inhaling oxygen during the 
rest period. 


Protecting ultramarathon runners. An ultramara- 
thon, as you might guess, is a footrace longer than the 
26.2 miles of a marathon. Runners commonly develop 
respiratory infections after an ultramarathon. Will 
taking 600 milligrams of vitamin C daily reduce these 
infections? Researchers randomly assigned ultramara- 
thon runners to receive either vitamin C or a placebo. 
Separately, they also randomly assigned these treat- 
ments to a group of nonrunners the same age as the 
runners. All subjects were watched for 14 days after 
the big race to see if infections developed.» 


(a) What is the name for this experimental design? 
(b) Use a diagram to outline the design. 


Wine, beer, or spirits? There is good evidence that 
moderate alcohol use improves health. Some people 
think that red wine is better for your health than other 
alcoholic drinks. You have recruited 300 adults aged 
45-65 who are willing to follow your orders about 
alcohol consumption over the next five years. You 
want to compare the effects on heart disease of moder- 
ate drinking of just red wine, just beer, or just spirits. 
Outline the design of a completely randomized experi- 
ment to do this. (No such experiment has been done 
because subjects aren’t willing to have their drinking 
regulated for years.) 


Wine, beer, or spirits? Women as a group develop 
heart disease much later than men. We can improve 
the completely randomized design of Exercise 9.47 by 
using women and men as blocks. Your 300 subjects 
include 120 women and 180 men. Outline a block 
design for comparing wine, beer, and spirits. Be sure 
to say how many subjects you will put in each group in 
your design. 
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Quick randomizing. Here’s a quick and easy way to 
randomize. You have 100 subjects: 50 women and 
50 men. Toss a coin. If it’s heads, assign all the men 
to the treatment group and all the women to the con- 
trol group. If the coin comes up tails, assign all the 
women to treatment and all the men to control. This 
gives every individual subject a 50-50 chance of being 
assigned to treatment or control. Why isn’t this a good 
way to randomly assign subjects to treatment groups? 


Do antioxidants prevent cancer? People who eat 
lots of fruits and vegetables have lower rates of colon 
cancer than those who eat little of these foods. Fruits 
and vegetables are rich in “antioxidants” such as vita- 
mins A, C, and E. Will taking antioxidants help pre- 
vent colon cancer? A medical experiment studied this 
question with 864 people who were at risk of colon 
cancer. The subjects were divided into four groups: 
daily beta-carotene, daily vitamins C and E, all three 
vitamins every day, or daily placebo. After four years, 
the researchers were surprised to find no significant 
difference in colon cancer among the groups.*# 


(a) What are the explanatory and response variables 
in this experiment? 

(b) Outline the design of the experiment. Use your 
judgment in choosing the group sizes. 

(c) The study was double-blind. What does this mean? 
(d) What does “no significant difference” mean in 
describing the outcome of the study? 

(e) Suggest some lurking variables that could explain 
why people who eat lots of fruits and vegetables have 
lower rates of colon cancer. The experiment suggests 
that these variables, rather than the antioxidants, 
may be responsible for the observed benefits of fruits 
and vegetables. 


SAMe for depression? S-adenosyl methionine, 
(SAMe), anaturally occurring molecule found through- 
out the body, has been used as an antidepressant with 
some success. It has been available commercially in 
Europe since the late 1970s and is now available over- 
the-counter in the United States. Participants in the 
current study were 73 individuals with major depres- 
sive disorder who had not responded to a standard 
treatment using serotonin reuptake inhibitors (SRI) to 
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relieve their symptoms. The effect of augmenting their 
SRI treatment with SAMe was investigated.* 


(a) The study was a randomized, double-blind trial con- 
ducted over six weeks, with 34 participants receiving 
a placebo (dummy pills) and the remaining 39 receiv- 
ing pills containing SAMe (a trial is a medical experi- 
ment using actual patients as subjects). Explain why 
it is important to have a placebo group rather than 
having all participants receive pills containing SAMe. 
What is the purpose of the two italicized terms in the 
context of this study? 

(b) A 50% reduction in the Hamilton Rating scale for 
depression over the treatment period was considered 
a positive response to treatment. It was found that 
36.1% of the SAMe group had a positive response 
versus 17.6% in the placebo group, a statistically 
significant difference. Explain what statistical 
significance means in the context of this trial. 

(c) From the information given, use a diagram to 
outline the design of this trial. 


Randomization avoids bias. Suppose that the 25 even- 
numbered students among the 50 students available 
for the comparison of classroom and online instruc- 
tion (Example 9.5) are older, employed students. We 
hope that randomization will distribute these students 
roughly equally between the classroom and online 
groups. Use the Simple Random Sample applet to take 
20 samples of size 25 from the 50 students. These 25 
students will be the classroom instruction group. (Be 
sure to click “Reset” after each sample.) Record the 
counts of even-numbered students in each of your 20 
samples. 


(a) How many older students would you expect to see 
in the classroom instruction group? 

(b) You see that there is considerable chance varia- 
tion in the number of older (even-numbered) students 
assigned to the classroom group. Draw a stem-and- 
leaf plot of the number of older students assigned to 
the classroom group. Do you see any systematic bias 
in favor of one or the other group being assigned the 
older students? Larger samples from a larger popula- 
tion will, on the average, do an even better job of 
creating two similar groups. 
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Exploring the Web 


Smoking cessation. Go to the New England Journal of Medicine website, 
www.nejm.org, and find the article “A Randomized, Controlled Trial of Finan- 
cial Incentives for Smoking Cessation” by Volpp et al. in the February 12, 2009, 
issue. Under the Issues link, you need to go to the Browse Full Index link and then 
to the February 12, 2009, issue. You can then download the pdf of the article for 
free. Was this a comparative study? Was randomization used? How many subjects 
took part? There were 22 subjects in the control group and 64 in the incentive 
group who were still not smoking six months after they quit. What were the per- 
cents in each group? This difference is statistically significant. Explain in simple 
language what this means. 


Find an experiment. You can find the latest medical research in the Journal of 
the American Medical Association at www. jama.ama-assn.org and the New 
England Journal of Medicine at www.nejm.org. Many of the articles describe ran- 
domized comparative experiments and use the language of statistical significance 
when giving conclusions. Look through the abstracts and find an experiment of 
interest to you. If your institution has a subscription to these journals, you should 
be able to view the entire article. Otherwise, use the information in the abstract to 
answer as many of these questions as you can. What was the purpose of the experi- 
ment? How many factors were in the experiment, and what were the levels of the 
factors? What response(s) were measured? How many subjects were assigned to 
each of the treatments, and was randomization used? Was it a double-blind experi- 
ment? What were the conclusions, and were the results statistically significant? 
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Data Ethics* 


he production and use of data, like all human endeavors, raise ethical ques- 

tions. We won't discuss the telemarketer who begins a telephone sales pitch 

with, “I’m conducting a survey.” Such deception is clearly unethical. It enrages 
legitimate survey organizations, which find the public less willing to talk with them. 
Neither will we discuss those few researchers who, in the pursuit of professional 
advancement, publish fake data. There is no ethical question here—faking data 
to advance your career is just wrong.! It will end your career when uncovered. But 
just how honest must researchers be about real, unfaked data? Here is an example 
suggesting the answer is, “More honest than they often are.” 


CHAPTER 


In this chapter 
we cover... 


10.1 Institutional review boards 
10.2 Informed consent 

10.3. Confidentiality 

10.4 Clinical trials 


10.5 Behavioral and social science 
experiments 


Papers reporting scientific research are supposed to be short, with no extra baggage. 
Brevity, however, can allow researchers to avoid complete honesty about their data. 
Did they choose their subjects in a biased way? Did they report data on only some of 
their subjects? Did they try several statistical analyses and report only the ones that 
looked best? The statistician John Bailar screened more than 4000 medical papers in 
more than a decade as consultant to the New England Journal of Medicine. He says, 


*This short chapter concerns a very important topic, but the material is not needed to understand 
the rest of the book. 
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“When it came to the statistical review, it was often clear that critical information was 
lacking, and the gaps nearly always had the practical effect of making the authors’ 
conclusions look stronger than they should have.”? The situation is no doubt worse in 
fields that screen published work less carefully. This problem continues to grow with 
the proliferation of open-access online journals that “will print seemingly anything for 
a fee” and provide little or no peer review.? m 


The most complex issues of data ethics arise when we collect data from people 
(but research with animals also raises ethical issues—see Exercise 10.25). The 
ethical difficulties are more severe for experiments that impose some treatment on 
people than for sample surveys that simply gather information. Trials of new medi- 
cal treatments, for example, can do harm as well as good to their subjects. Here are 
some basic standards of data ethics that must be obeyed by all studies that gather 
data from human subjects, both observational studies and experiments. 


Basic Data Ethics for Human Subjects 


All planned studies must be reviewed in advance by an institutional review board 
charged with protecting the safety and well-being of the subjects. 

All individuals who are subjects in a study must give their informed consent before 
data are collected. 


All individual data must be kept confidential. Only statistical summaries for groups 
of subjects may be made public. 


If subjects are children, then their consent is needed in addition to that of the 
parents or guardians. 

Many journals have a formal requirement of explicitly addressing human subjects 
issues if the study is classified as human subjects research. For example, here is a 
statement from the instructions for authors for JAMA (The Journal of the American 
Medical Association):* 


For all manuscripts reporting data from studies involving human participants or 
animals, formal review and approval, or formal review and waiver, by an ap- 
propriate institutional review board or ethics committee is required and should be 
described in the Methods section. For those investigators who do not have formal 
ethics review committees, the principles outlined in the Declaration of Helsinki 
should be followed. For investigations of humans, state in the Methods section the 
manner in which informed consent was obtained from the study participants (ie, 
oral or written). Editors may request that authors provide documentation of the 
formal review and recommendation from the institutional review board or ethics 
committee responsible for oversight of the study. 


Also, the law requires that studies carried out or funded by the federal govern- 
ment obey these principles.’ But neither the law nor the consensus of experts is 
completely clear about the details of their application. 


The purpose of an institutional review board is not to decide whether a proposed 
study will produce valuable information or whether it is statistically sound. The 
board’s purpose is, in the words of one university’s board, “to protect the rights and 
welfare of human subjects (including patients) recruited to participate in research 
activities.” The board reviews the plan of the study and can require changes. It 
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reviews the consent form to ensure that subjects are informed about the nature of 
the study and about any potential risks. Once research begins, the board monitors 
the study’s progress at least once a year. See Figure 10.1 for the web page of the 
institutional review board for the Mayo Clinic. 


INSTITUTIONAL REVIEW BOARD (IRB) 


Home 


Education and Training 


Federalwide 
Assurance (FWA) 


Definitions 


Policy Manual 


Director, Mayo Clinic Office 
of Human Research 
Protection 


OVERVIEW icc 


QUALITY 


The Mayo Clinic Institutional Review Board (IRB) reviews 
all human subject research conducted at Mayo Clinic 
Florida (MCF), Mayo Clinic Rochester (MCR), or Mayo 
Clinic Arizona (MCA) and research conducted at other 
facilities under the direction of MCF, MCR, or MCA staff. A 
guarantee that all human subject research at Mayo will be 
reviewed by the IRB has been given to the U.S. 
Department of Health and Human Services (HHS) in a 
Federalwide Assurance (FWAQ00005001). 


Related Resources 


Food and Drug 
ane Administration (FDA) 
Mission Guidance for Institutional 

Review Boards and Clinical 
The primary mission of Mayo Clinic's IRB is to ensure the Investigators (FDA) 
protection of rights, privacy and welfare of all human Office for Human Research 
participants in research programs conducted by Mayo Protections (OHRP) 
Clinic and associated faculty, professional staff, and National Institutes of Health 


students. Coexistent with participant protection is the goal WH 


of providing quality service to enhance the conduct of 
research. To achieve this goal, the IRB has the authority to 
review, approve, modify or disapprove research protocols 
submitted by faculty, staff and student investigators. The 
IRB review process is guided by federal rules and 
regulations, and is based on the Protection of Human 
Subject Code of Federal Regulations, the Belmont Report 
and provisions of 45CFR46 — Protection of Human 
Subjects requiring institutions receiving federal funds to 
have all research involving human participants be 
approved by an IRB. 


Courtesy of Mayo Clinic 


FIGURE 10.1 
The web page of the Mayo Clinic's institutional review board. It begins by describing the job of such boards. 
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The most pressing issue concerning institutional review boards is whether their 
workload has become so large that their effectiveness in protecting subjects drops. 
When the government temporarily stopped human subject research at Duke 
University Medical Center in 1999 due to inadequate protection of subjects, more 
than 2000 studies were going on. That’s a lot of review work. There are shorter 
review procedures for projects that involve only minimal risks to subjects, such as 
most sample surveys. When a board is overloaded, there is a temptation to put more 
proposals in the minimal-risk category to speed the work. 


Apply Your Knowledge 


10.1 Minimal Risk? You are a member of your college’s institutional review 
board. You must decide whether several research proposals qualify for less 
rigorous review because they involve only minimal risk to subjects. Feder- 
al regulations say that “minimal risk” means the risks are no greater than 
“those ordinarily encountered in daily life or during the performance of rou- 
tine physical or psychological examinations or tests.” That’s vague. Which of 
these do you think qualifies as “minimal risk”? 


a) ‘Take hair and nail clippings in a nondisfiguring manner. 
b) Draw a drop of blood by pricking a finger to measure blood sugar. 
c) Collect data on subjects through the use of X rays or microwaves. 


d) Insert a tube that remains in the arm so that blood can be drawn regularly. 


~ 


e) ‘Take permanent teeth if routine patient care indicates a need for extraction. 


( 
( 
( 
( 
( 
( 


has) 


Take extra specimens from a subject who is undergoing an invasive 
clinical procedure such as a bronchoscopy (a procedure in which a 
physician views the inside of the airways for diagnostic and therapeutic 
purposes using an instrument that is inserted into the airways, usually 
through the nose or mouth). 


10.2 Does This Really Need to Be Reviewed? A college professor would like 
to investigate a new method for teaching statistics. He teaches two lectures. 
He will use the standard approach to teaching in one lecture, and the new 
approach in the other. Should he seek institutional review board approval 
before proceeding? Discuss. 


Both words in the phrase “informed consent” are important, and both can be 
controversial. Subjects must be informed in advance about the nature of a study 
and any risk of harm it may bring. In the case of a sample survey, physical harm is 
not possible. The subjects should be told what kinds of questions the survey will 
ask and about how much of their time it will take. Experimenters must tell subjects 
the nature and purpose of the study and outline possible risks. Subjects must then 
consent in writing. 


Are there some subjects who can’t give informed consent? It was once common, for 
example, to test new vaccines on prison inmates who gave their consent in return for 
good-behavior credit. Now we worry that prisoners are not really free to refuse, and 
the law forbids almost all medical research in prisons. 


Children can’t give fully informed consent, so the usual procedure is to ask their 
parents. A study of new ways to teach reading is about to start at a local elementary 
school, so the study team sends consent forms home to parents. Many parents don't 
return the forms. Can their children take part in the study because the parents did 
not say “no,” or should we allow only children whose parents returned the form and 
said “yes”? 

What about research into new medical treatments for people with mental disor- 
ders? What about studies of new ways to help emergency room patients who may 
be unconscious? In most cases, there is not time to get the consent of the family. 
Does the principle of informed consent bar realistic trials of new treatments for 
unconscious patients? 

These are questions without clear answers. Reasonable people differ strongly on 
all of them. There is nothing simple about informed consent.® = 


The difficulties of informed consent do not vanish even for capable subjects. 


Some researchers, especially in medical trials, regard consent as a barrier to getting 
patients to participate in research. They may not explain all possible risks; they 
may not point out that there are other therapies that might be better than those 
being studied; they may be too optimistic in talking with patients, even when the 
consent form has all the right details. On the other hand, mentioning every possible 
risk leads to very long consent forms that really are barriers. “They are like rental 
car contracts,” one lawyer said. Some subjects don’t read forms that run five or six 
printed pages. Others are frightened by the large number of possible (but unlikely) 
disasters that might happen and so refuse to participate. Of course, unlikely disas- 
ters sometimes happen. When they do, lawsuits follow, and the consent forms 
become yet longer and more detailed. 


() LaunchPad Online Resources 


* The StatBoards video, Informed Consent and Psychological Experimentation, 
discusses a real example involving issues of informed consent. 
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Apply Your Knowledge 


10.3 Coercion? The U.S. Department of Health and Human Services regu- 
lations’ for informed consent state that, “An investigator shall seek such 
consent only under circumstances that provide the prospective subject or 
the representative sufficient opportunity to consider whether or not to par- 
ticipate and that minimize the possibility of coercion or undue influence.” 
Coercion occurs when an overt or implicit threat of harm is intentionally 
presented by one person to another in order to obtain compliance. Which of 
the following circumstances do you believe constitutes coercion? Discuss. 


(a) An investigator tells a prospective subject that she or he will lose 
access to needed health services if she or he does not participate in 
the research. 


(b) An employer asks employees to participate in a research study. Although 
the employer has assured employees that participation is voluntary, 
several employees are concerned that a decision not to participate 
could affect performance evaluations or job advancement. 


10.4 Undue Influence? Undue influence in obtaining informed consent often 
occurs through an offer of an excessive or inappropriate reward or other 
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anonymity 


overture in order to obtain compliance. Which of the following circum- 
stances do you believe constitutes undue influence? Discuss. 


(a) The patients of a physician are asked to participate in a study in which 
the physician is also the investigator. 

(b) A professor asks a student to participate in a research study. He 
tells the student that everyone else in the class has agreed to 
participate. 


(c) Research subjects are to be paid in exchange for their participation. 


Ethical problems do not disappear once a study has been cleared by the review 
board, has obtained consent from its subjects, and has actually collected data about 
the subjects. It is important to protect the subjects’ privacy. Privacy refers to a 
person’s interest in controlling the access of others to himself or herself, including 
information about himself or herself. One way this is done is by keeping all data 
about individuals confidential. Confidentiality refers to the agreement between the 
investigator and participant about how data will be managed and used. The report 
of an opinion poll may say what percent of the 1200 respondents felt that legal 
immigration should be reduced. It may not report what you said about this or any 
other issue. However the investigator who collected the data will know what you 
said about this or other issues in the poll. 

Confidentiality is not the same as anonymity. Anonymity means that sub- 
jects are anonymous—their names are not known even to the director of the 
study. Anonymity provides a high degree of privacy, but anonymity is rare in sta- 
tistical studies. Even where it is possible (mainly in surveys conducted by mail), 
anonymity prevents any follow-up to improve nonresponse or inform subjects of 
results. 

Any breach of confidentiality is a serious violation of data ethics. The best 
practice is to separate the identity of the subjects from the rest of the data at 
once. Sample surveys, for example, use the identification only to check on who 
did or did not respond. In an era of advanced technology, however, it is no longer 
enough to be sure that each individual set of data protects people’s privacy. The 
government, for example, maintains a vast amount of information about citi- 
zens in many separate databases—census responses, tax returns, Social Security 
information, data from surveys such as the Current Population Survey, and so 
on. Many of these databases can be searched by computers for statistical stud- 
ies. A clever computer search of several databases might be able, by combining 
information, to identify you and learn a great deal about you, even if your name 
and other identification have been removed from the data available for search. 
A colleague from Germany once remarked that “female full professor of statistics 
with a PhD from the United States” was enough to identify her among all the 83 
million residents of Germany. Privacy and confidentiality of data are hot issues 
among statisticians in the computer age. Computer hacking and thefts of laptops 
containing data add to the difficulties. Is it even possible to guarantee confiden- 
tiality of data stored in databases that can be hacked or stolen? The U.S. Social 
Security Administration has devised a comprehensive Internet privacy policy, 
part of which can be seen in Figure 10.2. 
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Social Security Sign In to my Social Security | FAQs | Contact Us | KE wW You{{J 


USA 
%, ll The Official Website of the U.S. Social Security Administration P&P Search... 
ste 


my Social Security Retirement | Disability Survivors 


Internet Privacy Policy 


Our Commitment To You 


As a Federal agency, the Privacy Act of 1974 (5 U.S.C. § 552a) requires us to protect the 
information we collect from you. We respect your right to privacy and will protect it when you 
visit our website. We have always treated the privacy of our customers with utmost 
importance. In fact, we wrote our first regulation to ensure your privacy. You may have access 
to any of the information we collect about you at this site and we will correct any errors you may 
find. Our regulation subsection 401.40 provides information on how to get information about 
you and subsection 401.65 provides information on how to correct information about you. 


The Privacy Policy below explains our online information practices. This policy applies only to 
the information we collect from you over the Internet. This policy does not apply to third-party 
websites that you are able to reach from our website, nor does it cover other information 
collection practices within the Social Security Administration. For more information about our 
privacy practices, please visit our privacy and disclosure webpage. 


> > Our Use Of Web Measurement And Customization Technologies 


> > Other Information We May Collect 


> > Why We Collect Personal Information 


> > Sharing Your Information 


> > How We Use Your Personal Information 


> > COPPA 


> > Security 


> > Email 


> > Visiting Other Websites 


> > Social Media Sites 


Social Security Administration 


FIGURE 10.2 
The privacy policy of the government's Social Security Administration website. 


Citizens are required to give information to the government. Think of tax returns and 
Social Security contributions. The government needs these data for administrative 
purposes—to see if you paid the right amount of tax and how large a Social Security 
benefit you are owed when you retire. Some people feel that individuals should be 
able to forbid any other use of their data, even with all identification removed. This 
would prevent using government records to study, say, the ages, incomes, and house- 
hold sizes of Social Security recipients. Such a study could well be vital to debates 
on reforming Social Security. m 
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Apply Your Knowledge 


10.5 Sunshine Laws. All states in the U.S. have open records laws, sometimes 
known as “Sunshine Laws,” that give citizens access to government meetings 
and records.® This includes, for example, reports of crimes and recordings 
of 911 calls. A crime report will include the name of anyone accused of the 
crime. Suppose a 10-year-old juvenile is accused of committing a crime. A 
reporter from the local newspaper asks for a copy of the crime report. The 
sheriff refuses to provide the report because the accused is a juvenile and he 
believes the name of the accused should be confidential. Is this an issue of 
confidentiality? Discuss. 


10.6 https. Generally, secure websites use encryption and authentication stan- 
dards to protect the confidentiality of web transactions. The most commonly 
used protocol for web security has been TLS, or Transport Layer Security. 
This technology is still commonly referred to as SSL. Web sites with addresses 
beginning with https use this protocol. Do you believe that https websites 
provide true confidentiality?? Do you think it is possible to guarantee the 
confidentiality of data on any website? Discuss. 


Clinical trials are experiments that study the effectiveness of medical treatments 
on actual patients. Medical treatments can harm as well as heal, so clinical trials 
spotlight the ethical problems of experiments with human subjects. Here are the 
starting points for a discussion: 


Randomized comparative experiments are by far the best way to see the true 
effects of new treatments. Without them, risky treatments that are no more 
effective than placebos will become common.” 

Clinical trials produce great benefits, but most of these benefits go to future 
patients. The trials also pose risks, and these risks are borne by the subjects of 
the trial. So we must balance future benefits against present risks. 


Both medical ethics and international human rights standards say that “the 
interests of the subject must always prevail over the interests of science and 
society.” 

The quoted words are from the 1964 Helsinki Declaration of the World Medical 


Association, the most respected international standard. The most outrageous 
examples of unethical experiments are those that ignore the interests of the subjects. 


In the 1930s, syphilis was common among black men in the rural South, a group 
that had almost no access to medical care. The Public Health Service Tuskegee study 
recruited 399 poor black sharecroppers with syphilis and 201 others without the 
disease to observe how syphilis progressed when no treatment was given. Beginning 
in 1943, penicillin became available to treat syphilis. The study subjects were not 
treated. In fact, the Public Health Service prevented any treatment until word leaked 
out and forced an end to the study in the 1970s. 

The Tuskegee study is an extreme example of investigators following their own 
interests and ignoring the well-being of their subjects. A 1996 review said, “It 
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has come to symbolize racism in medicine, ethical misconduct in human research, 
paternalism by physicians, and government abuse of vulnerable people.” In 1997, 
President Clinton formally apologized to the surviving participants in a White House 
ceremony.'' m 


Because “the interests of the subject must always prevail,” medical treatments 
can be tested in clinical trials only when there is reason to hope that they will 
help the patients who are subjects in the trials. Future benefits aren’t enough to 
justify experiments with human subjects. Of course, if there is already strong 
evidence that a treatment works and is safe, it is unethical not to give it. Here 
are the words of Dr. Charles Hennekens of the Harvard Medical School, who 
directed the large clinical trial that showed that aspirin reduces the risk of heart 
attacks: 


There’s a delicate balance between when to do or not do a randomized trial. On 
the one hand, there must be sufficient belief in the agent’s potential to justify expos- 
ing half the subjects to it. On the other hand, there must be sufficient doubt about 
its efficacy to justify withholding it from the other half of subjects who might be 
assigned to placebos." 


Why is it ethical to give a control group of patients a placebo? Well, we know 
that placebos often work. Moreover, placebos have no harmful side effects. So in 
the state of balanced doubt described by Dr. Hennekens, the placebo group may 
be getting a better treatment than the drug group. If we knew which treatment was 
better, we would give it to everyone. When we don’t know, it is ethical to try both 
and compare them.” 


Apply Your Knowledge 


10.7 Ethics and Scientific Validity. The authors of a paper on clinical research 
and ethics" stated the following. 


For a clinical research protocol to be ethical, the methods must be valid and 
practically feasible: the research must have a clear scientific objective; be de- 
signed using accepted principles, methods, and reliable practices; have suffi- 
cient power to definitively test the objective; and offer a plausible data analysis 
plan. In addition, it must be possible to execute the proposed study. 


Do you think this rules out observational studies as “ethical”? Discuss. You 
might wish to refer to the discussion on correlation and causation in Chapter 5 
(page 148). 


When we move from medicine to the behavioral and social sciences, the direct 
risks to experimental subjects are less acute, but so are the possible benefits to the 
subjects. Consider, for example, the experiments conducted by psychologists in their 
study of human behavior. 
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Psychologists observe that people have a “personal space” and are uneasy if others 
come too close to them. We don't like strangers to sit at our table in a coffee shop 
if other tables are available, and we see people move apart in elevators if there is 
room to do so. Americans tend to require more personal space than people in most 
other cultures. Can violations of personal space have physical, as well as emotional, 
effects? 

Investigators set up shop in a men’s public restroom. They blocked off urinals to 
force men walking in to use either a urinal next to an experimenter (treatment group) 
or a urinal separated from the experimenter (control group). Another experimenter, 
using a periscope from a toilet stall, measured how long the subject took to start 
urinating and how long he continued.’ = 


© David Pollack/Corbis 


This personal space experiment illustrates the difficulties facing those who plan 
and review behavioral studies. 


= There is no risk of harm to the subjects, although they would certainly object 
to being watched through a periscope. What should we protect subjects 
from when physical harm is unlikely? Possible emotional harm? Undignified 
situations? Invasion of privacy? 


e What about informed consent? The subjects did not even know they were 
participating in an experiment. Many behavioral experiments rely on hiding 
the true purpose of the study. The subjects would change their behavior if 
told in advance what the investigators were looking for. Subjects are asked to 
consent on the basis of vague information. They receive full information only 
after the experiment. 


The “Ethical Principles” of the American Psychological Association require con- 
sent unless a study merely observes behavior in a public place. They allow deception 
only when it is necessary to the study, does not hide information that might influence 
a subject’s willingness to participate, and is explained to subjects as soon as possible. 
The personal space study (from the 1970s) does not meet current ethical standards. 

We see that the basic requirement for informed consent is understood differ- 
ently in medicine and psychology. Here is an example of another setting with yet 
another interpretation of what is ethical. The subjects get no information and give 
no consent. They don’t even know that an experiment may be sending them to jail 
for the night. 


How should police respond to domestic violence calls? In the past, the usual practice 
was to remove the offender and order him to stay out of the household overnight. 
Police were reluctant to make arrests because the victims rarely pressed charges. 
Women’s groups argued that arresting offenders would help prevent future violence 
even if no charges were filed. Is there evidence that arrest will reduce future offenses? 
That's a question that experiments have tried to answer. 
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A typical domestic violence experiment compares two treatments: arrest the 
suspect and hold him overnight, or warn the suspect and release him. When police 
officers reach the scene of a domestic violence call, they calm the participants and 
investigate. Weapons or death threats require an arrest. If the facts permit an arrest 
but do not require it, an officer radios headquarters for instructions. The person on 
duty opens the next envelope in a file prepared in advance by a statistician. The enve- 
lopes contain the treatments in random order. The police either arrest the suspect or 
warn and release him, depending on the contents of the envelope. The researchers 
then watch police records and visit the victim to see if the domestic violence recurs. 

Such experiments show that arresting domestic violence suspects does reduce 
their future violent behavior.'® As a result of this evidence, arrest has become the 
common police response to domestic violence. m 


The domestic violence experiments shed light on an important issue of public 
policy. Because there is no informed consent, the ethical rules that govern clinical 
trials and most social science studies would forbid these experiments. They were 
cleared by review boards because, in the words of one domestic violence researcher, 
“These people became subjects by committing acts that allow the police to arrest 
them. You don’t need consent to arrest someone.” 


Apply Your Knowledge 


10.8 Deceiving Subjects. Students sign up to be subjects in a psychology exper- 
iment. When they arrive, they are placed in a room and assigned a task. Dur- 
ing the task, the subject hears a loud thud from an adjacent room and then 
a piercing cry for help. Some subjects are placed in a room by themselves. 
Others are placed in a room with “confederates” who have been instructed 
by the researcher to look up on hearing the cry, then return to their task. The 
treatments being compared are whether the subject is alone in the room or in 
the room with confederates. Will the subject ignore the cry for help? 

The students had agreed to take part in an unspecified study, and the true 
nature of the experiment is explained to them afterward. Do you think this 
study is ethically okay? Discuss. 


CHAPTER 10 SUMMARY 


Chapter Specifics 


m All planned studies must be reviewed in advance by an institutional review board 
charged with protecting the safety and well-being of the subjects. 


m All individuals who are subjects in a study must give their informed consent before 
data are collected. 


gm Allindividual data must be kept confidential. Only statistical summaries for groups of 
subjects may be made public. The goal is to protect subjects’ privacy. 


Link It 


Chapters 8 and 9 discuss methods for producing data. Applying these methods in practice 
raises ethical questions. In this chapter, we present some of these ethical issues. Many ethi- 
cal questions do not have a clear, correct answer, but there are some basic standards that 
must be obeyed by all studies. 


Susumu Nishinaga/Science Source 
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| CHAPTER 10 EXERCISES 


vee of these exercises pose issues for discussion. There are no right 
or wrong answers, but there are more and less thoughtful answers. 


10.9 


10.10 


10.11 


10.12 


Who reviews? Government regulations require that 
institutional review boards consist of at least five peo- 
ple, including at least one scientist, one nonscientist, 
and one person from outside the institution. Most 
boards are larger, but many contain just one outsider. 


(a) Why should review boards contain people who 
are not scientists? 

(b) Do you think that one outside member is enough? 
How would you choose that member? (For example, 
would you prefer a medical doctor? A member of the 
clergy? An activist for patients’ rights?) 


Informed consent. A researcher suspects that peo- 
ple with fundamentalist religious beliefs tend to be 
more prone to depression. She prepares a question- 
naire that measures depression and also asks many 
religious questions. Write a description of the pur- 
pose of this research to be read by subjects to ob- 
tain their informed consent. You must balance the 
conflicting goals of not deceiving the subjects about 
what the questionnaire will tell about them, and not 
biasing the sample by scaring off people with funda- 
mentalist religious views. 


Is consent needed? In which of the following cir- 
cumstances would you allow collecting personal in- 
formation without the subjects’ consent? 


(a) A government agency takes a random sample of 
income tax returns to obtain information on the mar- 
ital status and average income of people who identify 
themselves as belonging to an ultraconservative 
political group. Only the marital status and income 
are recorded from the returns, not the names. 

(b) A social psychologist attends public meetings 

of an ultraconservative political group to study the 
behavior patterns of members. 

(c) A social psychologist pretends to be converted 
to membership of an ultraconservative political 
group and attends private meetings to study the 
behavior patterns of members. 


Studying your blood. Long ago, doctors drew a 
blood specimen from you as part of treating minor 
anemia. Unknown to you, the sample was stored. 
Now researchers plan to use stored samples from 
you and many other people 
to look for genetic factors 
that may influence anemia. 
It is no longer possible to ask 
your consent. Modern tech- 
nology can read your entire 
genetic makeup from the 
blood sample. 
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10.14 


10.15 


(a) Do you think it violates the principle of informed 
consent to use your blood sample if your name is on 
it but you were not told that it might be saved and 
studied later? 

(b) Suppose that your identity is not attached. The 
blood sample is known only to come from (say) “a 
20-year-old white female being treated for anemia.” 
Is it now okay to use the sample for research? 

(c) Perhaps we should use biological materials such 
as blood samples only from patients who have agreed 
to allow the material to be stored for later use in re- 
search. It isn’t possible to say in advance what kind 
of research, so this falls short of the usual standard 
for informed consent. Is it nonetheless acceptable, 
given complete confidentiality and the fact that us- 
ing the sample can’t physically harm the patient? 


Anonymous? Confidential? One of the most impor- 
tant nongovernment surveys in the United States is the 
National Opinion Research Center’s General Social 
Survey. The GSS regularly monitors public opinion on 
a wide variety of political and social issues. Interviews 
are conducted in person in the subject’s home. Are a 
subject’s responses to GSS questions anonymous, con- 
fidential, or both? Explain your answer. 


Anonymous or confidential? The University of 
Wisconsin LaCrosse, like many universities, offers 
free screening for HIV, the virus that causes AIDS. 
The announcement at the Student Life Office web- 
site says that in the free testing program “you are 
never asked your name, address or social security 
number, only your age and sex. You will be assigned 
a 12-digit code number which only you will know.” 
Does this practice offer anonymity or just confiden- 
tiality? Explain your answer. 


Political polls. Suppose the presidential election cam- 
paign is in full swing, and the candidates have hired 
polling organizations to take sample surveys to find 
out what the voters think about the issues. What in- 
formation should the pollsters be required to give out? 


(a) What does the standard of informed consent 
require the pollsters to tell potential respondents? 
(b) The standards accepted by polling organizations 
also require giving respondents the name and 
address of the organization that carries out the poll. 
Why do you think this is required? 

(c) The polling organization usually has a profes- 
sional name, such as “Samples Incorporated,” 

so respondents don’t know that the poll is being 
paid for by a political party or candidate. Would 
revealing the sponsor to respondents bias the 
poll? Should the sponsor always be announced 
whenever poll results are made public? 
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Making poll results public. Some people think 
that the law should require all political poll results 
to be made public. Otherwise, the possessors of poll 
results can use the information to their own advan- 
tage. They can act on the information, release only 
selected parts of it, or time the release for best ef- 
fect. A candidate’s organization replies that they are 
paying for the poll to gain information for their own 
use, not to amuse the public. Do you favor requiring 
complete disclosure of political poll results? What 
about other private surveys, such as market research 
surveys of consumer tastes? 


Student subjects. Students taking Psychology 001 
are required to serve as experimental subjects. Stu- 
dents in Psychology 002 are not required to serve, 
but they are given extra credit if they do so. Students 
in Psychology 003 are required either to sign up as 
subjects or to write a term paper. Serving as an ex- 
perimental subject may be educational, but current 
ethical standards frown on using “dependent sub- 
jects” such as prisoners or charity medical patients. 
Students are certainly somewhat dependent on their 
teachers. Do you object to any of these course poli- 
cies? If so, which ones, and why? 


The Willowbrook hepatitis studies. In the 1960s, 
children entering the Willowbrook State School, 
an institution for the intellectually disabled, were 
deliberately infected with hepatitis. The research- 
ers argued that almost all children in the institution 
quickly became infected anyway. The studies showed 
for the first time that two strains of hepatitis existed. 
This finding contributed to the development of ef- 
fective vaccines. Despite these valuable results, the 
Willowbrook studies are now considered an example 
of unethical research. Explain why, according to cur- 
rent ethical standards, useful results are not enough 
to allow a study. 


Unequal benefits. Researchers on aging proposed to 
investigate the effect of supplemental health services 
on the quality of life of older people. Eligible patients 
on the rolls of a large medical clinic were to be ran- 
domly assigned to treatment and control groups. The 
treatment group would be offered hearing aids, den- 
tures, transportation, and other services not avail- 
able without charge to the control group. The review 
board felt that providing these services to some but 
not other persons in the same institution raised ethi- 
cal questions. Do you agree? 


How many have HIV? Researchers from Yale, 
working with medical teams in Tanzania, wanted to 
know how common infection with HIV, the virus 
that causes AIDS, is among pregnant women in that 
African country. To do this, they planned to test 
blood samples drawn from pregnant women. 
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Yale’s institutional review board insisted that 
the researchers get the informed consent of each 
woman and tell her the results of the test. This is the 
usual procedure in developed nations. The Tanzani- 
an government did not want to tell the women why 
blood was drawn or tell them the test results. The 
government feared panic if many people turned out 
to have an incurable disease for which the country’s 
medical system could not provide care. The study 
was canceled. Do you think that Yale was right to 
apply its usual standards for protecting subjects? 


AIDS trials in Africa. The drug programs that treat 
AIDS in rich countries are very expensive, so some 
African nations cannot afford to give them to large 
numbers of people. Yet AIDS is more common in 
parts of Africa than anywhere else. “Short-course” 
drug programs that are much less expensive might 
help, for example, in preventing infected pregnant 
women from passing the infection to their unborn 
children. Is it ethical to compare a short-course 
program with a placebo in a clinical trial? Some 
say “no”: this is a double standard, because in rich 
countries the full drug program would be the con- 
trol treatment. Others say “yes”: the intent is to find 
treatments that are practical in Africa, and the trial 
does not withhold any treatment that subjects would 
otherwise receive. What do you think? 


Abandoned children in Romania. The study de- 
scribed in Example 9.2 (page 231) randomly assigned 
abandoned children in Romanian orphanages to 
move to foster homes or to remain in an orphanage. 
All the children would otherwise have remained in 
an orphanage. The foster care was paid for by the 
study. There was no informed consent because the 
children had been abandoned and had no adult to 
speak for them. The experiment was considered 
ethical because “people who cannot consent can be 
protected by enrolling them only in minimal-risk re- 
search, whose risks do not exceed those of everyday 
life,” and because the study “aimed to produce re- 
sults that would primarily benefit abandoned, insti- 
tutionalized children.”!” Do you agree? 


Asking teens about sex. The Centers for Disease 
Control and Prevention, in a survey of teenagers, 
asked the subjects if they had ever had sexual inter- 
course. Males who said “yes” were then asked, “That 
very first time that you had sexual intercourse with 
a female, how old were you?” and “Please tell me 
the name or initials of your first sexual partner so 
that I can refer to her during the interview.” Should 
consent of parents be required to ask minors about 
sex, drugs, and other such issues, or is consent of the 
minors themselves enough? Give reasons for your 
opinion. 
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Deceiving subjects. A psychologist conducts the 
following experiment: he measures the attitude 
of subjects toward cheating, then has them take 
a mathematics skills exam in which the subjects 
are tempted to cheat. Subjects are told that high 
scores will receive a $100 gift certificate and that 
the purpose of the experiment is to see if rewards 
affect performance. The exam is computer-based 
and multiple choice. Subjects are left alone in a 
room with a computer on which the exam is avail- 
able and are told that they are to click on the an- 
swer they believe is correct. However, when subjects 
click on an answer, a small pop-up window appears 
with the correct answer indicated. When the pop-up 


Exploring the Web 


window is closed, it is possible to change the answer 
selected. The computer records—unknown to the 
subjects—whether or not they change their answers 
after closing the pop-up window. After the exam is 
finished, attitude toward cheating is retested. 
Subjects who cheat tend to change their atti- 
tudes to find cheating more acceptable. Those who 
resist the temptation to cheat tend to condemn 
cheating more strongly on the second test of atti- 
tude. These results confirm the psychologist’s theory. 
This experiment tempts subjects to cheat. The 
subjects are led to believe that they can cheat secretly 
when in fact they are observed. Is this experiment 
ethically objectionable? Explain your position. 


10.25 Research on animals. In Section 10.4, we quoted the 1964 Helsinki Declara- 
tion of the World Medical Association as saying “the interests of the subject must 
always prevail over the interests of science and society.” This applies to experi- 
ments with human subjects. What about experiments with animals? 

(a) Do you think there should be similar or different standards for research 
involving animals? In a few sentences, state your opinion. (There are no wrong 


answeIrs.) 


(b) Visit the website www. stanford.edu/group/hopes/cgi-bin/ 


wordpress/2010/07/animal-research/. What positions regarding 
research with animals are described at this site? Which of these positions do you 
favor? 

(c) Visit either the website www.nc3rs.org.uk/page.asp?id=7 or 
the site www.ccac.ca/en_/education/niaut/stream/cs-3rs. What 
are the three Rs of humane animal experimentation? Give an example of one way 
each of the three Rs might be satisfied. 


~~ 
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Producing Data: 
Part |] Review 


n Part I of this book, you mastered data analysis, the use of graphs and numerical 
summaries to organize and explore any set of data. Part II has introduced designs 
for data production. Part III will discuss basic probability and the foundations of 
inference. Parts IV and V will deal in detail with statistical inference. 

Designs for producing data are essential if the data are intended to represent 
some wider population or process. Figures 11.1 and 11.2 display the big ideas visu- 
ally. Random sampling and randomized comparative experiments are perhaps the 
most important statistical inventions of the 20th century. Both were slow to gain 
acceptance, and you will still see many voluntary response samples and uncon- 
trolled experiments. You should now understand good designs for producing data 
and also why bad designs often produce data that are worthless for inference. The 
deliberate use of chance in producing data is a central idea in statistics. It not only 
reduces bias but also allows us to use probability, the mathematics of chance, as 
the basis for inference. 


In this chapter 
we cover... 


m Part Il Summary 
m Test Yourself 


m= Supplementary Exercises 


SHAPTER dd 
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FIGURE 11.1 
Statistics in summary. 


FIGURE 11.2 
Statistics in summary. 


Simple Random Sample 


All samples of size n Sample data 


Population = 
are equally likely Xp XO} vag Xp 


Randomized Comparative Experiment 


Group 1 
ae Treatment 1 
ny subjects 


Random Compare 
allocation response 


Group 2 
mae Treatment 2 
No subjects 


Here are the most important skills you should have acquired from reading 
Chapters 8 to 10. 


A. Sampling 


it 
De 


Identify the population in a sampling situation. 

Recognize bias due to voluntary response samples and other inferior sam- 
pling methods. 

Use software or Table B of random digits to select a simple random sample 
(SRS) from a population. 

Recognize the presence of undercoverage and nonresponse as sources of 
error in a sample survey. Recognize the effect of the wording of questions 
on the responses. 

Use software or Table B of random digits to select a stratified random 
sample from a population when the strata are identified. 


B. Experiments 


iL: 
Ds 


Recognize whether a study is an observational study or an experiment. 
Recognize bias due to confounding of explanatory variables with lurking 
variables in either an observational study or an experiment. 

Identify the factors (explanatory variables), treatments, response variables, 
and individuals or subjects in an experiment. 


Outline the design of a completely randomized experiment using a dia- 
gram like that in Figure 9.3 (page 234). The diagram in a specific case 


should show the sizes of the groups, the specific treatments, and the 
response variable. 

Use software or Table B of random digits to carry out the random assign- 
ment of subjects to groups in a completely randomized experiment. 
Recognize the placebo effect. Recognize when the double-blind technique 
should be used. 

Explain why randomized comparative experiments can give good evidence 
for cause-and-effect relationships. 


C. Data Ethics (Optional topic) 
1. Understand the purpose of institutional review boards. 


2. Understand what informed consent means. 


3. Explain the difference between confidentiality and anonymity in research 


studies. 


if 1 


The following questions include both multiple-choice and short-answer questions and calculations. 
They will help you review the basic ideas and skills presented in Chapters 8-10. 


Elephants and bees. Elephants sometimes damage crops in Africa. It turns out that elephants 
dislike bees. They recognize beehives in areas where they are common and avoid them. Can this 
be used to keep elephants away from trees? A group in Kenya placed active beehives in some trees 
and empty beehives in others, whereas others received no beehives.! Will elephant damage be less 
in trees with hives? Will even empty hives keep elephants away? Use this information to answer 
Questions 11.1 and 11.2. 


11.1 


11.2 


11.3 


This experiment has 


(a) two factors, beehives present or absent. 
(b) matched pairs. 

(c) three treatments. 

(d) stratification by beehive. 


The response in this experiment is 


(a) the type of crop. 

(b) the presence or absence of bees. 
(c) the presence or absence of hives. 
(d) elephant damage. 


Do you trust the Internet? You want to ask a sample of college students the 
question, “How much do you trust information about health that you find on the 
Internet—a great deal, somewhat, not much, or not at all?” You try out this and 
other questions on a pilot group of six students chosen from your class. The class 
members are 


Adams Devore Guo Newberg Shoepf 
Aeffner Ding Heaton Paulsen  Spagnola 
Barnes Drake Huling Payton Terry 
Bower Eckstein Kahler Prince Vore 
Burke — Fassnacht Kessis — Pulak Wallace 
Cao Fullmer Lu Rabin Wanner 
Cisse Gandhi Mattos Roberts Zhang 


(a) Describe how you will label the students to select the sample. 

(b) Use the Simple Random Sample applet, other software, or Table B, beginning at 
line 115, to select the six students in the sample. 

(c) What is the response variable in this study? 


Test Yourself 


269 


wistiCS 
ICING THE KICKER 


0 

dtd The football team lines 
up for what they hope will be the 
winning field goal... and the other 
team calls time-out. “Make the 
kicker think about it” is their motto. 
Does “icing the kicker” really work? 
That is, does the probability of 
making a field goal go down when 
the kicker must wait around during 
the time-out? This isn’t a simple 
question. A detailed statistical study 
considered the distance, the weather, 
the kicker’s skill, and so on. The 
conclusion is cheering to coaches: 
yes, icing the kicker does reduce the 
probability of success. 
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American Community Survey. Each month the U.S. Census Bureau’s American Community 
Survey mails survey forms to 300,000 households asking questions about demographic, social, 
economic, and housing characteristics such as mortgage and utility costs. Telephone calls are made 
to households that don’t return the form. In one month, responses were obtained from 295,000 of 
the households contacted. Use this information to answer Questions 11.4 to 11.6. 


11.4 The sample is 


(a) the 300,000 households initially contacted. 
(b) the 295,000 households that responded. 
(c) the 5,000 households that did not respond. 
(d) all U.S. households. 


11.5 The population of interest is 


(a) all households with mortgages. 

(b) the 300,000 households contacted. 
(c) only U.S. households with phones. 
(d) all U.S. households. 


11.6 A source of bias in this survey is 


(a) voluntary response. 

(b) nonresponse. 

(c) the fact that the survey was not double-blind. 
(d) that only U.S. households were contacted. 


11.7 The Nurses’ Health Study has interviewed a sample of more than 100,000 female 
registered nurses every two years since 1976. The study finds that “light-to- 
moderate drinkers had a significantly lower risk of death” than either nondrinkers 
or heavy drinkers. The Nurses’ Health Study is 


(a) an observational study. 
(b) an experiment. 
(c) uncategorizable without more information. 


11.8 Ata local health club, a researcher samples 75 people whose primary exercise 
is cardiovascular and 75 people whose primary exercise is strength training. The 
researcher’s objective is to assess the effect of type of exercise on cholesterol. Each 
subject reported to a clinic to have his or her cholesterol measured. The subjects 
were unaware of the purpose of the study, and the technician measuring the 
cholesterol was not aware of the subjects’ type of exercise. This is 


(a) an observational study. 

(b) an experiment, but not a double-blind experiment. 
(c) a double-blind experiment. 
d) 


(d) a matched pairs experiment. 


11.9 A common definition of “binge drinking” is five or more drinks at one setting for 
men, and four or more for women. An observational study finds that students who 
binge have lower average GPA than those who don’t. Suggest two lurking variables 
that may be confounded with binge drinking, and be sure to give a reason why you 
have chosen each of these variables. The possibility of confounding means that we 
can’t conclude that binge drinking causes lower GPA. 


11.10 In 2000, when the federal budget showed a large surplus, the Pew Research Center 
asked random samples of adults two questions about using the remaining surplus. 
Both questions stated that Social Security would be “fixed.” 


Question A: Should the money be used for a tax cut, or should it be used to fund 
new government programs? 


Question B: Should the money be used for a tax cut, or should it be spent 
on programs for education, the environment, health care, crime-fighting, and 
military defense? 


One of these questions drew 60% favoring a tax cut. The other drew only 22%. 
Which wording pulls respondents toward a tax cut? Why? 


Effects of TV advertising. What are the effects of repeated exposure to an advertising mes- 
sage? The answer may depend both on the length of the ad and on how often it is repeated. To 
investigate this question, assign undergraduate students (the subjects) to view a 40-minute tele- 
vision program that includes commercials for a digital camera. There are three versions of the 
commercial: a 30-second commercial, a 45-second commercial, and a 60-second commercial. For 
each version of the commercial, some subjects will see the commercial once during the program and 
others will see the commercial twice during the program. After watching the TV program, all the 
subjects fill out a questionnaire that produces an “intention to buy” score with values between 0 
and 100. Use this information to answer Questions 11.11 and 11.12. 


11.11 


11.12 


11.13 


11.14 


11.15 


11.16 


The number of treatments in this study is 


The response in this experiment is 


(a) the length of time of the commercial. 

(b) the number of times the commercial is shown. 
(c) the “intention to buy” score. 

(d) the type of television program shown. 


The website of the PBS television program NOVA Science Now invites view- 
ers to vote on issues such as re-creating the virus responsible for the deadly flu 
epidemic of 1918. This online poll is unusual in offering detailed arguments 
for both sides. Of the 790 viewers who read the arguments and voted, 64% 
said that re-creating the virus was justified.” Explain to someone who knows 
no statistics why these 790 responses probably don’t represent the opinions of 
all American adults. 


A study attempts to determine whether a football filled with helium travels far- 
ther when kicked than one filled with air. Each subject kicks twice: once with a 
football filled with helium, and once with a football filled with air. The order of 
the type of football kicked is randomized. This is an example of 


(a) a matched pairs experiment. 

(b) a randomized controlled experiment. 
(c) a stratified experiment. 

(d) the placebo effect. 


In many countries, it has been found that people with higher IQs tend to have a 
greater incidence of myopia (nearsightedness). One of the more recent studies in 
Singapore looked at a sample of 1453 children aged 10 to 12 years. Participants 
were given both a nonverbal IQ test and an eye exam.’ This is as an example of 


(a) a two factor study with factors IQ and myopia. 

(b) an observational study. 

(c) a single blind experiment as only the experimenters were unaware of the 
children’s IQs and vision. 

(d) a matched pairs experiment. 


(Optional topic) A researcher promises subjects that he will only release statis- 
tical summaries of the results of the study to the public, although he will privately 
retain information about your data. This researcher is promising subjects 

(a) anonymity. 

(b) confidentiality. 

(c) minimal risk. 
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11.17 (Optional topic) Informed consent means 


(a) subjects are informed in advance about the nature of a study and any risk of 


harm it may bring. 


(b) subjects are volunteers and are adults. 
(c) subjects have been interviewed by an institutional review board prior to the 


start of the study. 


_ Supplementary Exercises 


Supplementary exercises apply the skills you have learned in ways 
that require more thought or more elaborate use of technology. 


11.18 Sampling students. You want to investigate the 


attitudes of students at your school toward the school’s 
policy on sexual harassment. You have a grant that 
will pay the costs of contacting about 500 students. 


(a) Specify the exact population for your study. For 
example, will you include part-time students? 

(b) Describe your sample design. Will you use a 
stratified sample? 

(c) Briefly discuss the practical difficulties that you 
anticipate. For example, how will you contact the 
students in your sample? 
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11.19 The placebo effect. A survey of physicians found 


that some doctors give a placebo to a patient who 
complains of pain for which the physician can find 
no cause. If the patient’s pain improves, these doctors 
conclude that it had no physical basis. The medical 
school researchers who conducted the survey claimed 
that these doctors do not understand the placebo 
effect. Why? 


11.20 Informed consent. The requirement that human 


subjects give their informed consent to participate in 
an experiment can greatly reduce the number of avail- 
able subjects. For example, a study of new teaching 
methods asks the consent of parents for their children 
to be taught by either a new method or the standard 
method. Many parents do not return the forms, so their 
children must continue to follow the standard curricu- 
lum. Why is it not correct to consider these children as 
part of the control group along with children who are 
randomly assigned to the standard method? 


11.21 Fixing health care. The cost of health care and 


health insurance is the biggest health concern among 


Americans, even ahead of cancer and other diseases. 
Changing to a national government health insurance 
system is controversial. An opinion poll will give differ- 
ent results depending on the wording of the question 
asked. For each of the following claims, say whether 
including it in the question would increase or decrease 
the percent of a poll sample who support a govern- 
ment health insurance system. 


(a) A national system would mean that everybody 
has health insurance. 

(b) A national system would probably require an 
increase in taxes. 

(c) Eliminating private insurance companies and 
their profits would reduce insurance costs. 

(d) A national system would limit the medical 
treatments available to contain costs. 


11.22 Market research. Stores advertise price reductions 


to attract customers. What type of price cut is most 
attractive? Market researchers prepared ads for ath- 
letic shoes announcing different levels of discounts 
(20%, 40%, or 60%). The student subjects who read 
the ads were also given “inside information” about the 
fraction of shoes on sale (50% or 100%). Each subject 
then rated the attractiveness of the sale on a scale of 
Iktowe: 


(a) There are two factors. Make a sketch like Figure 
9.2 (page 231) that displays the treatments formed 
by all combinations of levels of the factors. 

(b) Outline a completely randomized design using 

60 student subjects. Use software or Table B at 

line 111 to choose the subjects for the first treatment. 


11.23 Making french fries. Few people want to eat discol- 


ored french fries. Potatoes are kept refrigerated before 
being cut for french fries to prevent spoiling and pre- 
serve flavor. But immediate processing of cold pota- 
toes causes discoloring due to complex chemical reac- 
tions. The potatoes must therefore be brought to room 
temperature before processing. Design an experiment 
in which tasters will rate the color and flavor of french 
fries prepared from several groups of potatoes. The 
potatoes will be freshly picked or stored for a month at 
room temperature or stored for a month refrigerated. 
They will then be sliced and cooked either immedi- 
ately or after an hour at room temperature. 
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Subjects work until told to stop (after six minutes, but 
subjects do not know this). They are then asked to 
estimate how long they worked. The psychologist has 
30 students available to serve as subjects. 


(a) Describe the design of a completely randomized 
experiment to learn the effect of difficulty on estimated 
time. 

(b) Describe the design of a matched pairs experiment 
using the same 30 subjects. 


11.25 Alcohol and heart attacks. Many studies have 
found that people who drink alcohol in moderation 
have lower risk of heart attack than either nondrink- 
ers or heavy drinkers. Does alcohol consumption also 
improve survival after a heart attack? One study fol- 
lowed 1913 people who were hospitalized after severe 
heart attacks. In the year before their heart attacks, 
47% of these people did not drink, 36% drank moder- 
ately, and 17% drank heavily. After four years, fewer 
of the moderate drinkers had died.’ 
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(a) What are the factors and their levels, the treat- 
ments, and the response variables? 

(b) Describe and outline the design of this experiment. 
(c) It is efficient to have each taster rate fries from 


all treatments. How will you use randomization in (a) Is this an observational study or an experiment? 
presenting fries to the tasters? Why? What are the explanatory and response 
variables? 


11.24 How long did | work? A psychologist wants to know 
if the difficulty of a task influences our estimate of how 
long we spend working at it. She designs two sets of 
mazes that subjects can work through on a computer. 
One set has easy mazes, and the other has hard mazes. 


(b) Suggest some lurking variables that may be 
confounded with the drinking habits of the subjects. 
The possible confounding makes it difficult to 
conclude that drinking habits explain death rates. 
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From Data 
Production to 
Inference 


rmed with designs for producing trustworthy data, we continue our 

journey toward statistical inference. Exploratory data analysis allows 

us to examine the data obtained from sampling or experiments, but 

simply describing or looking for patterns in the data at hand is often 
not the primary goal. Usually data are used to answer specific questions, posed 
before the data are collected. If the sample has been selected using the prin- 
ciples presented in Chapter 8, the sample can tell us about important aspects 
of the population from which it was obtained. In a comparative experiment, 
the data can indicate how strong the evidence is that our treatment would be 
superior to the placebo for a broader class of circumstances. 

Generalizing the results of sampling or experiments to a larger group of indi- 
viduals or a broader class of circumstances is one goal of statistical inference. 
The conclusions of inference use the language of probability, the mathematics 
of chance. Chapters 12 and 15 present the ideas we need, and the optional 
Chapters 13 and 14 add more detail. Armed with designs for producing trust- 
worthy data, data analysis to examine the data, and the language of probability, 
we are prepared to understand the big ideas of inference in Chapters 16, 17, 
and 18. These chapters are the foundation for the discussion of inference in 
practice that occupies the rest of the book. 


PART 


PROBABILITY 
AND SAMPLING 
DISTRIBUTIONS 


CHAPTER 12 Introducing 
Probability 


CHAPTER 13. General 
Rules of Probability* 


CHAPTER 14 Binomial 
Distributions* 


CHAPTER 15 Sampling 
Distributions 


FOUNDATIONS OF 
INFERENCE 


CHAPTER 16 Confidence 
Intervals: The Basics 


CHAPTER 17 Tests of 
Significance: The Basics 


CHAPTER 18 Inference 
in Practice 


CHAPTER 19 From Data 
Production to Inference: 
Part Ill Review 
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In this chapter 


ntroducing Probability —wessee- 


12.1. The idea of probability 


12.2. The search for randomness* 
12.3 Probability models 
12.4 Probability rules 


12.5 Finite and discrete probability 
models 


hy is probability, the mathematics of chance behavior, needed to under- 12.6 Continuous probability models 
stand statistics, the science of data? Let’s look at a typical sample survey. 
12.7 Random variables 


12.8 Personal probability* 


EXAMPLE 12.1 Do You Own a Gun? 


What proportion of all U.S. adults own a gun? We don’t know, but we do have results 
from the Gallup Poll. Gallup took a random sample of 1005 adults. The poll found 
that 342 of the people in the sample own a gun. The proportion who own a gun is 


: 342 . 
sample proportion = cous 0.34 (thatis, 34%) 


If the sample was a simple random sample of all adults,' then all adults had the same 
chance to be among the chosen 1005. It would be reasonable to use this 34% as 
an estimate of the unknown proportion in the population. It’s a fact that 34% of the 
sample claimed to own a gun—we know because Gallup asked them. We don’t know 
what percent of all adults own a gun, but we estimate that about 34% did. This is a 
basic move in statistics: use a result from a sample to estimate something about a 
population. = 
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What if Gallup took a second random sample of 1005 adults? The new sample 
would have different people in it. It is almost certain that there would not be exactly 
342 positive responses. That is, Gallup’s estimate of the proportion of adults who 
own a gun will vary from sample to sample. Could it happen that one random 
sample finds that 34% of adults own a gun, and a second random sample finds that 

56% own a gun? Random samples eliminate bias from the act of choosing a 
A sample, but they can still be wrong because of the variability that results when 

we choose at random. If the variation when we take repeated samples from 
the same population is too great, we can’t trust the results of any one sample. 

This is where we need facts about probability to make progress in statistics. Because 
Gallup uses chance to choose its samples, the laws of probability govern the behavior 
of the samples. Gallup says that the probability is 0.95 that an estimate from one of 
its samples comes within +4 percentage points of the truth about the population of 
all adults. The first step toward understanding this statement is to understand what 
“probability 0.95” means. Our purpose in this chapter is to understand the language 
of probability, but without going into the full mathematics of probability theory. 


To understand why we can trust random samples and randomized comparative 
experiments, we must look closely at chance behavior. The big fact that emerges is 
this: chance behavior is unpredictable in the short run but has a regular and 
predictable pattern in the long run. 

Toss a coin, or choose a random sample. The result can’t be predicted in advance 
because the result will vary when you toss the coin or choose the sample repeatedly. 
But there is still a regular pattern in the results, a pattern that emerges clearly only 
after many repetitions. This remarkable fact is the basis for the idea of probability. 


When you toss a coin, there are only two possible outcomes: heads or tails. Figure 12.1 
shows the results of tossing a coin 5000 times, twice. For each number of tosses from 
1 to 5000, we have plotted the proportion of those tosses that gave a head. Trial A 
(solid red line) begins tail, head, tail, tail. You can see that the proportion of heads for 
Trial A starts at O on the first toss, rises to 0.5 when the second toss gives a head, 
then falls to 0.33 and 0.25 as we get two more tails. Trial B, on the other hand, starts 
with five straight heads, so the proportion of heads is 1 until the sixth toss. 

The proportion of tosses that produce heads is quite variable at first. Trial A starts 
low, and Trial B starts high. As we make more and more tosses, however, the propor- 
tion of heads for both trials gets close to 0.5 and stays there. If we made yet a third 
trial at tossing the coin a great many times, the proportion of heads would again 
settle down to 0.5 in the long run. This is the intuitive idea of probability. Probability 
0.5 means “occurs half the time in a very large number of trials.” The probability 0.5 
appears as a horizontal line on the graph. = 


We might suspect that a coin has probability 0.5 of coming up heads just because 
the coin has two sides. But we can’t be sure. In fact, spinning a penny on a flat 
surface, rather than tossing the coin, gives heads probability about 0.45 rather than 
0.5.2 The idea of probability is empirical. That is, it is based on observation rather 
than theorizing. Probability describes what happens in very many trials, and we 
must actually observe many trials to pin down a probability. In the case of tossing a 
coin, some diligent people have in fact made thousands of tosses. 
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FIGURE 12.1 
yp === The proportion of tosses of a coin that 
E give a head changes as we make more 
0.9 4 \ tosses. Eventually, however, the propor- 
Ea tion approaches 0.5, the probability of 
0.8 > \ a head. This figure shows the results of 


Ora two trials of 5000 tosses each. 
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The French naturalist Count Buffon (1707-1788) tossed a coin 4040 times. Result: 
2048 heads, or proportion 2048/4040 = 0.5069 for heads. 

Around 1900, the English statistician Karl Pearson heroically tossed a coin 
24,000 times. Result: 12,012 heads, a proportion of 0.5005. 

While imprisoned by the Germans during World War Il, the South African math- 
ematician John Kerrich tossed a coin 10,000 times. Result: 5067 heads, a proportion 
of 0.5067. = 


Randomness and Probability 


We call a phenomenon random if individual outcomes are uncertain but there is 
nonetheless a regular distribution of outcomes in a large number of repetitions. 


The probability of any outcome of a random phenomenon is the proportion of times 
the outcome would occur in a very long series of repetitions. 


The best way to understand randomness is to observe random behavior, as in 
Figure 12.1. You can do this with physical devices like coins, but computer simu- 
lations (imitations) of random behavior allow faster exploration. The Probability eS 
applet is a computer simulation that animates Figure 12.1. It allows you to choose 
the probability of a head and simulate any number of tosses of a coin with that prob- 
ability. Experience shows that the proportion of heads gradually settles down close 

to the probability. Equally important, it also shows that the proportion in a 
A small or moderate number of tosses can be far from the probability. Probability 

describes only what happens in the long run. Of course, we can never observe 
a probability exactly. We could always continue tossing the coin, for example. 
Mathematical probability is an idealization based on imagining what would happen 
in an indefinitely long series of trials. 
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: int DOES GOD PLAY 
iB DICE? 

Few things in the world are truly 
random in the sense that no amount 
of information will allow us to predict 
the outcome. But, according to the 
branch of physics called quantum 
mechanics, randomness does rule 
events inside individual atoms. 
Although Albert Einstein helped 
quantum theory get started, he 
always insisted that nature must 
have some fixed reality, not just 
probabilities. “I shall never believe 
that God plays dice with the world,” 
said the great scientist. A century 
after Einstein's first work on quantum 
theory, it appears that he was wrong. 


Introducing Probability 


Random numbers are valuable. They are used to choose random samples, to 
shuffle the cards in online poker games, to encrypt our credit card numbers when 
we buy online, and as part of simulations of the flow of traffic and the spread of 
epidemics. Where does randomness come from, and how can we get random num- 
bers? We defined randomness by how it behaves: unpredictably in the short run, but 
showing a regular pattern in the long run. Probability describes the long-run regular 
pattern. That many things are random in this sense is an observed fact about the 
world. Not all these things are “really” random. Here’s a quick tour of how to find 
random behavior and get random numbers. 

The easiest way to get random numbers is from a computer program. Of course, a 
computer program just does what it is told to do. Run the program again, and you 
get exactly the same result. The random numbers in Table B, the outcomes of the 
Probability applet, and the random numbers that shuffle cards for online poker come 
from computer programs, so they aren't “really” random. Clever computer programs 
produce outcomes that look random even though they really aren’t. These pseudo- 
random numbers are more than good enough for choosing samples and shuffling 
cards. But they may have hidden patterns that can distort scientific simulations. 

You might think that physical devices such as coins and dice produce really random 
outcomes. But a tossed coin obeys the laws of physics. If we knew all the inputs of the 
toss (forces, angles, and so on), then we could say in advance whether the outcome 
will be heads or tails. The outcome of a toss is predictable rather than random. Why 
do the results of tossing a coin look random? The outcomes are extremely sensitive 
to the inputs, so that very small changes in the forces you apply when you toss a coin 
change the outcome from heads to tails and back again. In practice, the outcomes are 
not predictable. Probability is a lot more useful than physics for describing coin tosses. 

We call a phenomenon with “small changes in, big changes out” behavior 
chaotic. If we can feed chaotic behavior into a computer, we can do better than 
pseudorandom numbers. Coins and dice are awkward, but you can go to the 
website www.random.org to get random numbers from radio noise in the atmo- 
sphere, a chaotic phenomenon that is easy to feed to a computer. 

Is anything really random? As far as current science can say, behavior inside 
atoms really is random—that is, there isn’t any way to predict behavior in advance 
no matter how much information we have. It was this “really, truly random” idea 
that Einstein disliked as he watched the new science of quantum mechanics 
emerge. Really, truly random numbers generated from the radioactive decay of 
atoms is available at the HotBits website, www. fourmilab.ch/hotbits. 
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* The Snapshots video, Probability, discusses some examples of probability 
and its interpretation. 


Apply Your Knowledge 


12.1 A Straight Flush. You read online that the probability of being dealt a 
straight flush in a five-card poker hand is 1/64,974. Explain carefully what 
this means. In particular, explain why it does not mean that if you are dealt 
64,974 five-card poker hands, one will be a straight flush. 


*This short discussion is optional. 
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12.2 Probability Says ... Probability is a measure of how likely an event is to 
occur. Match one of the probabilities that follow with each statement of like- 
lihood given. (The probability is usually a more exact measure of likelihood 
than is the verbal statement.) 


O 0.01 045 0.50 0.55 0.99 1 
(a) This event is impossible. It can never occur. 
(b) This event is certain. It will occur on every trial. 
(c) This event is very likely, but it will not occur once in a while in a long 
sequence of trials. 


(d) This event will occur slightly less often than not. 


12.3 Random Digits. The table of random digits (Table B) was produced by a 

random mechanism that gives each digit probability 0.1 of being a 0. 

(a) What proportion of the first 200 digits (those in the first five lines) in 
the table are Os? This proportion is an estimate, based on 200 repeti- 
tions, of the true probability, which we know is 0.1. 

(b) The Probability applet can imitate random digits. Set the probability 
of heads in the applet to 0.1. Check “Show true probability” to show 
this value on the graph. A head stands for a 0 in the random digit 
table, and a tail stands for any other digit. Simulate 200 digits (set the 
“Number of Tosses” to 200 and click on “Toss”). What was the result 
of your 200 tosses? 


12.4 The Long Run but Not the Short Run. Our intuition about chance behav- 
ior is not very accurate. In particular, we tend to expect that the long-run pat- 
tern described by probability will show up in the short run as well. For example, 
we tend to think that tossing a coin 10 times will give close to five heads. 

(a) Set the probability of heads in the Probability applet to 0.5 and the 
number of tosses to 10. Click “Toss” to simulate 10 tosses of a balanced 
coin. What was the proportion of heads? 

(b) Click “Reset” and toss again. The simulation is fast, so do it 25 times and 
keep a record of the proportion of heads in each set of 10 tosses. Make a 
stemplot of your results. You see that the result of tossing a coin 10 times is 
quite variable and need not be very close to the probability 0.5 of heads. 


Gamblers have known for centuries that the fall of coins, cards, and dice displays 
clear patterns in the long run. The idea of probability rests on the observed fact 
that the average result of many thousands of chance outcomes can be known with 
near certainty. How can we give a mathematical description of long-run regularity? 

To see how to proceed, think first about a very simple random phenomenon, 
tossing a coin once. When we toss a coin, we cannot know the outcome in advance. 
What do we know? We are willing to say that the outcome will be either heads or 
tails. We believe that each of these outcomes has probability 1/2. This description 
of coin tossing has two parts: 


e A list of possible outcomes 


= A probability for each outcome 


Such a description is the basis for all probability models. Here is the basic vocabulary 
we use. 
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Probability Models 


The sample space Sof a random phenomenon is the set of all possible outcomes. 


An event is an outcome or a set of outcomes of a random phenomenon. That is, an 
event is a subset of the sample space. 


A probability model is a mathematical description of a random phenomenon con- 
sisting of two parts: a sample space S and a way of assigning probabilities to events. 


A sample space S can be very simple or very complex. When we toss a coin once, 
there are only two outcomes: heads and tails. The sample space is S = {H, T}. 
When Gallup draws a random sample of 1005 adults, the sample space contains 
all possible choices of 1005 of the 241 million adults in the United States. This S 
is extremely large. Each member of S is a possible sample, so S is the collection or 
“space” of all possible samples. This explains the term sample space. 


Rolling two dice is a common way to lose money in casinos. There are 36 possible 
outcomes when we roll two dice and record the up-faces in order (first die, second 
die). Figure 12.2 displays these outcomes. They make up the sample space S. “Roll a 
5” is an event, call it A, that contains four of these 36 outcomes: 


Go Be Bo BB 


How can we assign probabilities to this sample space? We can find the actual 
probabilities for two specific dice only by actually tossing the dice many times, and 
even then only approximately. So we will give a probability model that assumes ideal, 
perfectly balanced dice. This model will be quite accurate for carefully made casino 
dice and less accurate for the cheap dice that come with a board game. 


FIGURE 12.2 

The 36 possible outcomes in rolling 
two dice. If the dice are carefully 
made, all these outcomes have the 
same probability. 


If the dice are perfectly balanced, all 36 outcomes in Figure 12.2 will be equally 
likely. That is, each of the 36 outcomes will come up on one thirty-sixth of all rolls in 
the long run. So each outcome has probability 1/36. There are four outcomes in the 
event A (“roll a 5”), so this event has probability 4/36. In this way we can assign a 
probability to any event. So we have a complete probability model. m= 


In general, if all outcomes in a sample space are equally likely, we find the prob- 
ability of any event by 


number of ways the event could occur 


total number of outcomes in the sample space 


sample space for rolling two dice and counting the spots is 
S = {2, 3, 4,5, 6, 7, 8, 9, 10, 11, 12} 


A ing the detailed description of the random phenomenon we are describing. 


from Figure 12.2) has probability 4/36. Here is the complete probability model: 


Gamblers care only about the total number of spots on the up-faces of the dice. The 


Comparing this S with Figure 12.2 reminds us that we can change S by chang- 


What are the probabilities for this new sample space? The 11 possible 
outcomes are not equally likely because there are six ways to roll a 7 and only one 
way to roll a 2 or a 12. That’s the key: each outcome in Figure 12.2 has probability 
1/36. So “roll a 7” has probability 6/36 because this event contains six of the 36 
outcomes. Similarly, “roll a 2” has probability 1/36, and “roll a 5” (four outcomes 


Total Spots 2 3 4 B 6 7 8 9 10 11 12 


Probability 1/36 2/36 3/36 4/36 5/6 6/36 5/36 4/36 3/36 2/36 1/36 


Apply Your Knowledge 


12.5 Sample Space. Choose a student at random from a large statistics class. 
Describe a sample space S for each of the following. (In some cases, you may 
have some freedom in specifying S.) 


(a) Does the student live on campus or off campus? 

(b) What is the student’s age in years? 

(c) What are the last four digits of the student’s cell phone number? 
(d) Record the student’s letter grade at the end of the course. 


12.6 Role-Playing Games. Computer games in which the players take the roles 
of characters are very popular. They go back to earlier tabletop games such 
as Dungeons & Dragons. These games use many different types of dice. A 
four-sided die has faces with one of the numbers 1, 2, 3, or 4 appearing at the 
bottom of each visible face. 


(a) What is the sample space for rolling a four-sided die twice (numbers 
on first and second rolls)? Follow the example of Figure 12.2. 

(b) What is the assignment of probabilities to outcomes in this sample 
space? Assume that the die is perfectly balanced, and follow the 
method of Example 12.4. 

12.7 Role-Playing Games. The intelligence of a character in a game is deter- 
mined by rolling the four-sided die twice and adding 1 to the sum of the 
numbers. Start with your work in Exercise 12.6 to give a probability model 
(sample space and probabilities of outcomes) for the character’s intelligence. 
Follow the method of Example 12.5. 


In Examples 12.4 and 12.5, we found probabilities for tossing dice. As random 
phenomena go, dice are pretty simple. Even so, we had to assume idealized, perfectly 
balanced dice. In most situations, it isn’t easy to give a “correct” probability model. 
We can make progress by listing some facts that must be true for any assignment of 
probabilities. These facts follow from the idea of probability as “the long-run propor- 


tion of repetitions on which an event occurs.” 


12.4 Probability Rules 283 


Slpix/Dreamstime.com 


284 


pists 5 
EQUALLY LIKELY? 


i 

A f bridge begins 

game oO ge beg’ 

by dealing all 52 cards in the deck 
to the four players, 13 to each. If the 
deck is well shuffled, all the immense 
number of possible hands will be 
equally likely. But don’t expect the 
hands that appear in newspaper 
bridge columns to reflect the equally 
likely probability model. Writers on 
bridge choose “interesting” hands, 
especially those that lead to high bids 
that are rare in actual play. 


CHAPTER 12 @ Introducing Probability 


1. Any probability is a number between 0 and 1. Any proportion is a number 
between 0 and 1, so any probability is also a number between 0 and 1. An event 
with probability 0 never occurs, and an event with probability 1 occurs in every 
trial. An event with probability 0.5 occurs in half the trials in the long run. 


2. All possible outcomes together must have probability 1. Because some out- 
come must occur on every trial, the sum of the probabilities for all possible 
outcomes must be exactly 1. 


3. If two events have no outcomes in common, the probability that one or 
the other occurs is the sum of their individual probabilities. If one event 
occurs in 40% of all trials, a different event occurs in 25% of all trials, and the 
two can never occur together, then one or the other occurs on 65% of all trials 
because 40% + 25% = 65%. 


4. The probability that an event does not occur is 1 minus the probability 
that the event does occur. If an event occurs in (say) 70% of all trials, then it 
fails to occur in the other 30%. The probability that an event occurs and the 
probability that it does not occur always add to 100%, or 1. 


We can use mathematical notation to state Facts 1 to 4 more concisely. Capital 
letters near the beginning of the alphabet denote events. If A is any event, we write 
its probability as P(A). Here are our probability facts in formal language. As you 
apply these rules, remember that they are just another form of intuitively true facts 
about long-run proportions. 


Probability Rules 


Rule 1. The probability P(A) of any event A satisfies 0 = P(A) = 1. 
Rule 2. If S is the sample space in a probability model, then P(S) = 1. 
Rule 3. Two events A and B are disjoint if they have no outcomes in common and 
so can never occur together. If A and B are disjoint, 
P(A or B) = P(A) + P(B) 
This is the addition rule for disjoint events. 
Rule 4. For any event A, 


P(A does not occur) = 1 — P(A) 


The addition rule extends to more than two events that are disjoint in the sense 
that no two have any outcomes in common. If events A, B, and C are disjoint, the 
probability that one of these events occurs is P(A) + P(B) + P(C). 


We already used the addition rule for disjoint events, without calling it by that name, to find 
the probabilities in Example 12.5. The event “roll a 5” contains the four disjoint outcomes 
displayed in Example 12.4, so the addition rule (Rule 3) says that its probability is 


P(roll a 5) = P(E 2])+ P(E J) + ei ) + ei 2) 


1 1 1 
+++ 
36 36 36 36 


Check that the probabilities in Example 12.5, found using the addition rule, are all between 
O and 1 and add to exactly 1. That is, this probability model obeys Rules 1 and 2. 
What is the probability of rolling anything other than a 5? By Rule 4, 


P(roll does not givea 5) = 1 — P(rolla5) 
= 11-0111 = 0.889 


Our model assigns probabilities to individual outcomes. To find the probability of an 
event, just add the probabilities of the outcomes that make up the event. For example: 


P(outcome is odd) = P(3) + P(5) + P(7) + P(9) + P(11) 
2 4 6 4 2 
= 442454 
36 36 36 36 36 
18 1 


= ._=-78 
36 2 
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The StatBoards video, The Four Basic Probability Rules, discusses some examples 
of the use of the four probability rules, including examples of disjoint events. 


Apply Your Knowledge 


12.8 Who Takes the GMAT? In many settings, the “rules of probability” are 
just basic facts about percents. The Graduate Management Admission Test 
(GMAT) website provides the following information about the geographic 
regions of those who took the test in 2012-13: 2.3% were from Africa; 0.4% 
were from Australia and the Pacific Islands; 2.7% were from Canada; 11.5% 
were from Central and South Asia; 29.1% were from East and Southeast 
Asia; 2.1% were from Eastern Europe; 2.8% were from Mexico, the Caribbean, 
and Latin America; 3.6% were from the Middle East; 38.0% were from the 
United States; and 7.4% were from Western Europe.’ 

(a) What percent of those who took the test in 2012-13 were from 
North America (either Canada, the United States, Mexico, the 
Caribbean, or Latin America)? Which rule of probability did you 
use to find the answer? 

(b) What percent of those who took the test in 2012-13 were from 
some other region than the United States? Which rule of probability 
did you use to find the answer? 


12.9 Overweight? Although the rules of probability are just basic facts about 
percents or proportions, we need to be able to use the language of events 
and their probabilities. Choose an American adult aged 20 years and over 
at random. Define two events: 


A = the person chosen is obese 
B = the person chosen is overweight, but not obese 


According to the National Center for Health Statistics, P(A) = 0.36 and 


P(B) = 0.33. 
(a) Explain why events A and B are disjoint. 
(b) Say in plain language what the event “A or B” is. What is P(A or B)? 


(c) IfC is the event that the person chosen has normal weight or less, 
what is P(C)? 
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12.10 Languages in Canada. Canada has two official languages: English and 
French. Choose a Canadian at random and ask, “What is your mother 
tongue?” Here is the distribution of responses, combining many separate 
languages from the province of Quebec:* 


Language _ English French Other. 
Probability 0.083 0.789? 


(a) What is the probability that a Canadian’s mother tongue is either English 
or French? 

(b) What probability should replace “?” in the distribution? 

(c) What is the probability that a Canadian’s mother tongue is not English? 


12.11 Are They Disjoint? Which of the following pairs of events, A and B, are 
disjoint? Explain your answers. 


(a) A person is selected at random. A is the event “sex of the person selected 
is male” and B is the event “sex of the person selected is female.” 

(b) A person is selected at random. A is the event “the person selected 
earns more than $100,000 per year” and B is the event “the person 
selected earns more than $250,000 per year.” 

(c) A pair of dice are tossed. A is the event “one of the dice is a 3” and B 
is the event “the sum of the two dice is 3.” 


Examples 12.4, 12.5, and 12.6 illustrate one way to assign probabilities to events: 
assign a probability to every individual outcome, then add these probabilities to find 
the probability of any event. This idea works well when there are only a finite (fixed 
and limited) number of outcomes. 


Finite Probability Model 


A probability model with a finite sample space is called finite. 

To assign probabilities in a finite model, list the probabilities of all the individual 
outcomes. These probabilities must be numbers between 0 and 1 that add to exactly 1. 
The probability of any event is the sum of the probabilities of the outcomes making 
up the event. 


Finite probability models are sometimes called discrete probability models. 
However, discrete probability models include finite sample spaces as well as sample 
spaces that are infinite and equivalent to the set of all positive integers. An example 
of a discrete but not finite sample space would be the sample space for the number 
of free-throw attempts until a basketball player makes her first free throw. This 
could occur on her first attempt, her second attempt, her third attempt, and so on. 
Assigning probabilities to individual outcomes in an infinite discrete sample space 
is more complicated than for a finite sample space. In this book we will often refer 
to finite probability models as discrete, and in practice statisticians often refer to 
finite probability models as discrete. 
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Faked numbers in tax returns, invoices, or expense account claims often display pat- 
terns that aren’t present in legitimate records. Some patterns, such as too many round 
numbers, are obvious and easily avoided by a clever crook. Others are more subtle. 
It is a striking fact that the first digits of numbers in legitimate records often follow a 
model known as Benford’s law.® Call the first digit of a randomly chosen record X for 
short. Benford’s law gives this probability model for X (note that a first digit can’t be 0): 


First Digit X 1 2 3 4 5 6 7 8 9 
Probability 0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046 


Check that the probabilities of the outcomes sum to exactly 1. This is therefore a valid 

finite (or discrete) probability model. With these probabilities, investigators can detect 

fraud by comparing the first digits in records such as invoices paid by a business. 
The probability that a first digit is equal to or greater than 6 is 


P(X = 6) = P(X = 6) + P(X = 7) + P(X = 8) + P(X = 9) 
= 0.067 + 0.058 + 0.051 + 0.046 = 0.222 
This is less than the probability that a record has first digit 1, 
P(X = 1) = 0.301 
Fraudulent records tend to have too few 1s and too many higher first digits. 
A Note that the probability that a first digit is greater than or equal to 6 is not 


the same as the probability that a first digit is strictly greater than 6. 
The latter probability is 


P(X > 6) = 0.058 + 0.051 + 0.046 = 0.155 


The outcome X = 6 is included in “greater than or equal to” and is not included in 
“strictly greater than.” m 


In Chapter 9 (page 235) we discussed completely randomized experimental designs. 
Suppose you have three men—Ari, Luis, Troy—and three women—Ana, Deb, and 
Hui—for an experiment, Three of the six subjects are to be assigned completely at 
random to a new experimental weight loss treatment and three to a placebo. Here all 


Treatment Group | Treatment Group 


Ari, Luis, Troy 


Luis, Troy, Ana 


Ari, Luis, Ana 


Luis, Troy, Deb 


Ari, Luis, Deb 
Ari, Luis, Hui 


Luis, Troy, Hui 
Luis, Ana, Deb 


Ari, Troy, Ana 


Luis, Ana, Hui 


Ari, Troy, Deb 
Ari, Troy, Hui 


Luis, Deb, Hui 
Troy, Ana, Deb 


Ari, Ana, Deb 


Troy, Ana, Hui 


Ari, Ana, Hui 
Ari, Deb, Hui 


Troy, Deb, Hui 
Ana, Deb, Hui 
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20 possible ways of selecting three of these subjects for the treatment group (the 
remaining three are in the placebo group). 

With a completely randomized design, each of these 20 possible treatment groups 
is equally likely, thus each has probability 1/20 of being the actual group assigned to 
the treatment. Notice that the chance that all the men are assigned to the treatment 
group is 1/20, and the chance that the treatment group consists of either all men or 
all women is 2/20. m 
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* The StatClips Examples video, Probability Distributions Example A, discusses 
Benford’s law. 


Apply Your Knowledge 


12.12 Rolling a Die. Figure 12.3 displays several possible finite probability 
models for rolling a die. We can learn which model is actually accurate 
for a particular die only by rolling the die many times. However, some of 
the models are not valid. That is, they do not obey the rules. Which are 
valid and which are not? In the case of the invalid models, explain what is 


wrong. 
Probability 
Outcome Model1 Model 2 Model 3 Model 4 
[+] 7 13 13 1 
| 1/7 1/6 1/6 1 
[J 1/7 6 6 2 
eal 17 ) 1/6 ii 
[r=3] 1/7 1/6 1/6 1 
FIGURE 12.3 ; 
1/6 1/6 2 
Four assignments of probabilities to the pea ie / / 


six faces of a die, for Exercise 12.12. 


12.13 Benford’s Law. The first digit of a randomly chosen expense account 
claim follows Benford’s law (Example 12.7). Consider the events 
A = {first digit is 4 or greater} 
B = {first digit is even} 
(a) What outcomes make up the event A? What is P(A)? 
(b) What outcomes make up the event B? What is P(B)? 
(c) What outcomes make up the event “A or B”? What is P(A or B)? 
Why is this probability not equal to P(A) + P(B)? 


12.14 Dinner at Home. Choose a parent with children under age 18 in the 
United States at random and ask, “How many nights a week out of seven 
does your family eat dinner together at home?” Call the response X for 
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short. Based on a large sample survey, here is a probability model for the 
answer you will get:® 


0 1 2 3 4 5 6 7 
OOS OOO O07 OOS MOORS OF Os SOS 


Nights 
Probability 


(a) Verify that this is a valid finite probability model. 
(b) Describe the event X < 4 in words. What is P(X < 4)? 


(c) Express the event “eat dinner together at home at least once a week” 
p g 
in terms of X. What is the probability of this event? 


12.6 Continuous probability models ag, 
REALLY RANDOM 


When we use the table of random digits to select a digit between 0 and 9, the My DIGITS 
finite probability model assigns probability 1/10 to each of the 10 possible outcomes. oy pins. the RAND Comaration 
Suppose that we want to choose a number at random between 0 and 1, allowing any long ago published a book titled 
number between 0 and | as the outcome. Software random number generators will | One Million Random Digits. The 
do this. For example, here is the result of asking software to produce five random _ book lists 1,000,000 digits that were 


numbers (to a fixed number of decimal places) between 0 and 1: produced by a very elaborate physical 
randomization and really are random. 


0.2893511 0.3213787 0.5816462 0.9787920 0.4475373 An employee of RAND once said 
that this is not the most boring book 
The sample space is now an entire interval of numbers: that RAND has ever published. 


S = {all numbers between 0 and 1} 


Call the outcome of the random number generator Y for short. How can we assign 
probabilities to such events as {0.3 = Y = 0.7}? As in the case of selecting a ran- 
dom digit, we would like all possible outcomes to be equally likely. But we cannot 
assign probabilities to each individual value of Y and then add them, because there 
is an infinite continuum of possible values. In fact, we cannot even make a list of 
the individual values of Y. For example, what is the next largest value of Y after 0.3? 

We use a new way of assigning probabilities directly to events—as areas under 
a density curve. Any density curve has area exactly 1 underneath it, correspond- 
ing to total probability 1. We met density curves as models for data in Chapter 3 
(page 75). 


Continuous Probability Model 


A continuous probability model assigns probabilities as areas under a density 
curve. The area under the curve and above any range of values is the probability of 
an outcome in that range. 


The random number generator will spread its output uniformly across the entire inter- 
val from O to 1 as we allow it to generate a long sequence of numbers. Figure 12.4 
is a histogram of 10,000 random numbers. They are quite uniform, but not exactly 
so. The bar heights would all be exactly equal (1000 numbers for each bar) if the 
10,000 numbers were exactly uniform. In fact, the counts vary from a low of 978 
to a high of 1060. 

As in Chapter 3, we have adjusted the histogram scale so that the total area of the 
bars is exactly 1. Now we can add the density curve that describes the distribution of 
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FIGURE 12.4 


The probability model for the outcomes 
of a software random number generator, 
for Example 12.9. Compare the histo- q 
gram of 10,000 actual outcomes with 
the uniform density curve that spreads 
probability evenly between O and 1. 
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perfectly random numbers. This density curve also appears in Figure 12.4. It has height 

uniform distribution 1 over the interval from O to 1. This is the density curve of a uniform distribution. 
It is the continuous probability model for the results of generating very many random 
numbers. Like the probability models for perfectly balanced coins and dice, the density 
curve is an idealized description of the outcomes of a perfectly uniform random num- 
ber generator. It is a good approximation for software outcomes, but even 10,000 tries 
isn’t enough for actual outcomes to look exactly like the idealized model. m 


The uniform density curve has height 1 over the interval from 0 to 1. The area 
under the curve is 1, and the probability of any event is the area under the curve 
and above the interval that corresponds to the event in question. Figure 12.5 illus- 
trates finding probabilities as areas under the density curve. The probability that 
the random number generator produces a number between 0.3 and 0.7 is 


P(0.3 <Y <0.7) =04 


Area = 0.4 Area = 0.5 Area = 0.2 


Height = 1 
FIGURE 12.5 el a 
Probability as area under a density fe) 0.3 0.7 1 0 0.5 0.8 1 
curve. The uniform density curve spreads (a) P(03<v<o07) (b) P(YS0.5 ory > 0.8) 


probability evenly between O and 1. 
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because the area under the density curve and above the interval from 0.3 to 0.7 is 
0.4. The height of the curve is 1, and the area of a rectangle is the product of height 
and length, so the probability of any interval of outcomes is just the length of the 
interval. Similarly, 


P(Y = 0.5) = 0.5 
P(Y > 0.8) = 0.2 
P(Y S$ 0.50rY > 0.8) = 0.7 


The last event consists of two nonoverlapping intervals, so the total area above the 
event is found by adding two areas, as illustrated by Figure 12.5(b). This assignment 
of probabilities obeys all our rules for probability. 

Continuous probability models assign probabilities to intervals of outcomes 
rather than to individual outcomes. In fact, all continuous probability models assign 
probability 0 to every individual outcome. Only intervals of values have positive prob- 
ability. To see that this is true, consider a specific outcome such as P(Y = 0.8). The 
probability of any interval is the same as its length. The point 0.8 has no length, 
so its probability is 0. Put another way, P(Y > 0.8) and P(Y = 0.8) are both 0.2 
because that is the area in Figure 12.5(b) between 0.8 and 1. 

We can use any density curve to assign probabilities. The density curves that 
are most familiar to us are the Normal curves. Normal distributions are continu- 
ous probability models as well as descriptions of data. There is a close connection 
between a Normal distribution as an idealized description for data and a Normal 
probability model. If we look at the heights of all young women, we find that they 
closely follow the Normal distribution with mean pp = 64.3 inches and standard 
deviation o = 2.7 inches. This is a distribution for a large set of data. Now choose 
one young woman at random. Call her height X. If we repeat the random choice 
very many times, the distribution of values of X is the same Normal distribution 
that describes the heights of all young women. 


68 and 70 inches? The height X of the woman we choose has the N(64.3, 2.7) 
distribution. We want P(68 = X = 70). This is the area under the Normal curve in 
Figure 12.6. Software or the Normal Density Curve applet will give us 
the answer at once: P(68 = X S 70) = 0.0679. 

We can also find the probability by standardizing and using Table A, 
the table of standard Normal probabilities. We will reserve capital Z 
for a standard Normal variable. 


What is the probability that a randomly chosen young woman has height between eS 


68 — 64.3 X- 64.3 70 - 64.3 
P68 = X= 70) = P( = = ) 
20] 2.7 27 
= P(1.37 SZ 2.11) 
= P(Z = 2.11) — P(Z = 1.37) 


= 0.9826 — 0.9147 = 0.0679 


Firs Ve 
Blend Images/Getty Images 
The calculation is the same as those we did in Chapter 3. Only the 


language of probability is new. m= 
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Normal curve 
uw = 643,06 =2.7 


FIGURE 12.6 


The probability in Example 12.10 as an 


area under a Normal curve. 


FIGURE 12.7 

The density curve for the sum of two 
random numbers, for Exercise 12.16. 
This density curve spreads probability 
between O and 2. 


Probability = 0.0679 


68 70 
Height in inches 


(3) LaunchPad Online Resources 


* The StatClips Examples videos, Probability Distributions Example B and C, 
discuss examples of calculating probabilities for continuous probability models. 


Apply Your Knowledge 


12.15 


12.16 


Random Numbers. Let Y be a random number between 0 and 1 pro- 
duced by the idealized random number generator described in Example 12.9 
and Figure 12.4. Find the following probabilities: 


(ay PY = 06) 

(by PCY = 10.6) 

(c) P(0.4=Y 0.8) 

(d) P(0.4 < Y $0.8) 

Adding Random Numbers. Generate two random numbers between 0 


and 1 and take X to be their sum. The sum X can take any value between 0 
and 2. The density curve of X is the triangle shown in Figure 12.7. 


(a) Verify by geometry that the area under this curve is 1. 


(b) What is the probability that X is less than 1? (Sketch the density 
curve, shade the area that represents the probability, then find that 
area. Do this for part (c) also.) 


(c) What is the probability that X is less than 0.5? 


Height = 1 


12.17 The Medical College Admission Test. The Normal distribution with 
mean po = 25.3 and standard deviation o = 6.5 is a good description of 
the total score on the Medical College Admission Test (MCAT).’ This is a 
continuous probability model for the score of a randomly chosen student. 
Call the score of a randomly chosen student X for short. 


(a) Write the event “the student chosen has a score of 35 or higher” in 
terms of X. 


(b) Find the probability of this event. 


Examples 12.7, 12.9, and 12.10 use a shorthand notation that is often convenient. 
In Example 12.10, we let X stand for the result of choosing a woman at random and 
measuring her height. We know that X would take a different value if we made 
another random choice. Because its value changes from one random choice to 
another, we call the height X a random variable. 


Random Variable 


A random variable is a variable whose value is a numerical outcome of a random 
phenomenon. 

The probability distribution of a random variable X tells us what values X can take 
and how to assign probabilities to those values. 


We usually denote random variables by capital letters near the end of the alphabet, 
such as X or Y. Of course, the random variables of greatest interest to us are outcomes 
such as the mean x of a random sample, for which we will keep the familiar nota- 
tion. There are two main types of random variables, corresponding to two types of 
probability models: discrete and continuous. Notice that neither a finite sample space 
nor a sample space consisting of all positive integers would be considered continuous, 
because neither sample space is an interval of all possible numbers (to an arbitrary 
number of decimal places) between two given values. Thus, we classify random vari- 
ables as either discrete or continuous, rather than as either finite or continuous. 


The first digit X in Example 12.7 is a random variable whose possible values are 
the whole numbers {1, 2, 3, 4, 5, 6, 7, 8, 9}. The distribution of X assigns a 
probability to each of these outcomes. A discrete random variable has a finite 
list of possible outcomes. 

Compare the output Y of the random number generator in Example 12.8. The values 
of Y fill the entire interval of numbers between O and 1. The probability distribution of Y 
is given by its density curve, shown in Figure 12.4. A continuous random variable can 
take on any value in an interval, with probabilities given as areas under a density curve. m 


We have defined a random variable to be a variable whose value is a numerical 
outcome of a random phenomenon. However, there are situations in which the out- 
comes of a random phenomenon are not numerical. In Example 12.2, the possible 
outcomes of a coin toss are heads and tails. We can represent these outcomes by 
numbers, letting 1 represent heads and 0 represent tails. Using numbers to represent 
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discrete random variable 


continuous random variable 
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the nonnumerical outcomes of a random phenomenon is a common practice in 
statistics. For mathematical purposes it is convenient to restrict random variables to 
take on numerical values, even if this means representing nonnumerical outcomes 
by numbers. 


Apply Your Knowledge 


12.18 Grades in an Economics Course. Indiana University posts the grade 
distributions for its courses online.’ Students in Economics 201 in the fall 
2013 semester received these grades: 18% A, 8% A—, 7% B+, 16% B, 11% 
B—, 6% C+, 12% C, 4% C—, 3% D+, 6% D, 4% D-—, and 5% E Choose 
an Economics 201 student at random. To “choose at random” means to 
give every student the same chance to be chosen. The student’s grade on a 
four-point scale (with A = 4, A— = 3.7, B+ = 3.3, B = 3.0, B— = 2.7, 
CH S43; C = 10, C= = if, IDS = 13, ID = LO, ID= = OL7, anal 
F = 0.0) is a discrete random variable X with this probability distribution: 


Wellstone || CO Of i) i ky 20 43 Ae SO 3s So 40, 
Probability 0.05 0.04 0.06 0.03 0.04 0.12 0.06 O11 0.16 0.07 0.08 0.18 


(a) Say in words what the meaning of P(X = 3.0) is. What is this probability? 


(b) Write the event “the student got a grade poorer than B—” in terms of 
values of the random variable X. What is the probability of this event? 


12.19 Running a Mile. A study of 12,000 able-bodied male students at the Uni- 
versity of Illinois found that their times for the mile run were approximately 
Normal with mean 7.11 minutes and standard deviation 0.74 minute.’ 
Choose a student at random from this group and call his time for the mile Y. 


(a) Say in words what the meaning of P(Y = 8) is. What is this probability? 
(b) Write the event “the student could run a mile in less than 6 minutes” 


in terms of values of the random variable Y. What is the probability of 
this event? 


We began our discussion of probability with one idea: the probability of an 
outcome of a random phenomenon is the proportion of times that outcome would 
occur in a very long series of repetitions. This idea ties probability to actual out- 
comes. It allows us, for example, to estimate probabilities by simulating random 
phenomena. Yet we often meet another, quite different, idea of probability. 


Joe sits staring into his beer as his favorite baseball team, the Chicago Cubs, loses 
another game. The Cubbies have some good young players, so let’s ask Joe, “What's 
the chance that the Cubs will go to the World Series next year?” Joe brightens up. 
“Oh, about 10%,” he says. 

Does Joe assign probability 0.10 to the Cubs appearing in the World Series? 
The outcome of next year’s pennant race is certainly unpredictable, but we can’t 
reasonably ask what would happen in many repetitions. Next year’s baseball season 


*This short section is optional. 
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will happen only once and will differ from all other seasons in players, weather, and 
many other ways. If probability measures “what would happen if we did this many 
times,” Joe’s 0.10 is not a probability. Probability is based on data about many 
repetitions of the same random phenomenon. Joe is giving us something else, his 
personal judgment. = 


Although Joe’s 0.10 isn’t a probability in our usual sense, it gives useful informa- 
tion about Joe’s opinion. More seriously, a company asking, “How likely is it that 
building this plant will pay off within five years?” can’t employ an idea of probability 
based on many repetitions of the same thing. The opinions of company officers and 
advisers are nonetheless useful information, and these opinions can be expressed in 
the language of probability. These are personal probabilities. 


Personal Probability 


A personal probability of an outcome is a number between 0 and | that expresses 
an individual’s judgment of how likely the outcome is. 


Rachel’s opinion about the Cubs may differ from Joe’s, and the opinions of several 
company officers about the new plant may differ. Personal probabilities are indeed 
personal: they vary from person to person. Moreover, if two people assign different 
personal probabilities to an event, it may be difficult or impossible to determine who 
is more correct. If we say, “In the long run, this coin will come up heads 60% of 
the time,” we can find out if we are right by actually tossing the coin several thou- 
sand times. If Joe says, “I think the Cubs have a 10% chance of going to the World 
Series next year,” that’s just Joe’s opinion. Why think of personal probabilities as 
probabilities? Because any set of personal probabilities that makes sense obeys the same 
basic Rules 1 to 4 that describe any legitimate assignment of probabilities to events. If Joe 
thinks there’s a 10% chance that the Cubs will go to the World Series, he must also 
think that there’s a 90% chance that they won’t go. There is just one set of rules of 
probability, even though we now have two interpretations of what probability means. 


pe 
A WHAT ARE THE 
win ODDS? 
Gamblers often express chance in 
terms of odds rather than probability. 
Odds of A to B against an outcome 
means that the probability of that 
outcome is B/(A + B). So “odds 
of 5 to 1” is another way of saying 
“probability 1/6.” A probability is 
always between 0 and 1, but odds 
range from 0 to infinity. Although 
odds are mainly used in gambling, 
they give us a way to make very small 
probabilities clearer. “Odds of 999 to 
1” may be easier to understand than 
“probability 0.001.” 


Apply Your Knowledge 


12.20 Will You Have an Accident? The probability that a randomly chosen 
driver will be involved in an accident in the next year is about 0.2. This is 
based on the proportion of millions of drivers who have accidents. “Accident” 
includes things like crumpling a fender in your own driveway, not just high- 
way accidents. 


(a) What do you think is your own probability of being in an accident in 
the next year? This is a personal probability. 


(b) Give some reasons why your personal probability might be a more 
accurate prediction of your “true chance” of having an accident than 
the probability for a random driver. 

(c) Almost everyone says their personal probability is lower than the 
random driver probability. Why do you think this is true? 


12.21 Winning the ACC Tournament. The annual Atlantic Coast Conference 
men’s basketball tournament has temporarily taken Joe’s mind off of the 
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Chicago Cubs. He says to himself, “I think that Syracuse has probability 
0.1 of winning. North Carolina’s probability is twice Syracuse’s, and Duke’s 


apenas 


probability is three times Syracuse’s. 


(a) What are Joe’s personal probabilities for North Carolina and Duke? 


(b) What is Joe’s personal probability that one of the 12 teams other than 
Syracuse, North Carolina, and Duke will win the tournament? 


CHAPTER 12 SUMMARY 


Chapter Specifics 


= A random phenomenon (chance experiment) has outcomes that we cannot predict 
but that nonetheless have a regular distribution in very many repetitions. 


mu The probability of an event is the proportion of times the event occurs in many 
repeated trials of a random phenomenon. 


ma A probability model for a random phenomenon consists of a sample space S and an 
assignment of probabilities P. 


m The sample space S is the set of all possible outcomes of the random phenomenon. Sets 
of outcomes are called events. P assigns a number P(A) to an event A as its probability. 


m Any assignment of probability must obey the rules that state the basic properties of 
probability: 


1. 0 = P(A) $< 1 for any event A. 
2. P(S) = 1. 


3. Addition rule for disjoint events: Events A and B are disjoint if they have no 
outcomes in common. If A and B are disjoint, then P(A or B) = P(A) + P(B). 


4. For any event A, P(A does not occur) = 1 — P(A). 


m When a sample space S contains finitely many possible values, a finite probability 
model assigns each of these values a probability between 0 and 1 such that the sum of 
all the probabilities is exactly 1. The probability of any event is the sum of the prob- 
abilities of all the values that make up the event. Finite probability models are also 
referred to as discrete probability models. 


m A sample space can contain all values in some interval of numbers. A continuous 
probability model assigns probabilities as areas under a density curve. The probability 
of any event is the area under the curve above the values that make up the event. 


m A random variable is a variable taking numerical values determined by the outcome 
of a random phenomenon. The probability distribution of a random variable X tells 
us what the possible values of X are and how probabilities are assigned to those values. 


m A random variable X and its distribution can be discrete or continuous. The distribu- 
tion of a discrete random variable with finitely many possible values gives the prob- 
ability of each value. A continuous random variable takes all values in some interval 
of numbers. A density curve describes the probability distribution of a continuous 
random variable. 


Link It 


This chapter begins our study of probability. The important fact is that random phenomena 
are unpredictable in the short run, but have a regular and predictable behavior in the long 
run. Probability rules and probability models provide the tools for describing and predicting 
the long-run behavior of random phenomena. 


Probability helps us understand why we can trust random samples and randomized 
comparative experiments, the subjects of Chapters 8 and 9. It is the key to generalizing 
what we learn from data produced by random samples and randomized comparative experi- 
ments to some wider universe or population. How we use probability to do this will be the 
topic of the remainder of this book. 


() LaunchPad Online Resources 
If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


¢ StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 


Summary 
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CHECK YOUR SKILLS 


12.22 You read in a book on poker that the probability of 


12.23 


being dealt two pairs in a five-card poker hand is 
1/21. This means that 


(a) if you deal thousands of poker hands, the frac- 
tion of them that contain two pairs will be very 
close to 1/21. 

(b) if you deal 21 poker hands, exactly one of them 
will contain two pairs. 

(c) if you deal 21,000 poker hands, exactly 1000 of 
them will contain two pairs. 

A basketball player shoots five free throws during a 
game. The sample space for counting the number 
she makes is 

(a) S = any number between 0 and 1. 

(b) S = whole numbers 0 to 5. 


(c) S = all sequences of five hits or misses, like 
HMMHH. 


Here is the probability model for the political affiliation of a ran- 
domly chosen adult in the United States.!° Exercises 12.24 to 12.27 
use this information. 


Political 

affiliation | Republican Independent Democrat Other 
Probability 0.25 0.42 0.30 ? 
12.24 This probability model is 


12.25 


12.26 


(b) finite. 


The probability that a randomly chosen American 
adult’s political affiliation is “Other” must be 


(a) continuous. (c) equally likely. 


(a) any number between 0 and 1. 
(b) 0.03. (c) 0.3. 


What is the probability that a randomly chosen 
American adult is a member of one of the two major 
political parties (Republicans and Democrats)? 


(a) 0.42 (b) 0.55 (c) 0.97 


12.27 


12.28 


12.29 


12.30 


What is the probability that a randomly chosen 
American adult is not a Republican? 

(a) 0.25 (b) 0.75 (c) 0.03 

In a table of random digits such as Table B, each digit 
is equally likely to be any of 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9. 
What is the probability that a digit in the table is a 7? 
(a) 1/9 (b) 1/10 (c) 9/10 

In a table of random digits such as Table B, each 
digit is equally likely to be any of 0, 1, 2, 3, 4, 5, 6, 
7, 8, or 9. What is the probability that a digit in the 
table is 7 or greater? 


(a) 7/10 (b) 4/10 (c) 3/10 


Choose an American household at random, and let 
the random variable X be the number of cars (in- 
cluding SUVs and light trucks) the residents own. 
Here is the probability model if we ignore the few 
households that own more than six cars:!! 


Number 
of cars X 0 1 2 3 4 5 6 
Probability 0.09 0.33 036 0.14 0.05 0.02 0.01 


12.31 


A housing company builds houses with two-car ga- 
rages. What percent of households have more cars 
than the garage can hold? 

(a) 14% (b) 22% (c) 42% 

Choose a common fruit fly Drosophila melanogaster 
at random. Call the length of the thorax (where 
the wings and legs attach) Y. The random vari- 
able Y has the Normal distribution with mean 
= 0.800 millimeter (mm) and standard deviation 
o = 0.078 mm. The probability P(Y > 1) that the 
fly you choose has a thorax more than 1 mm long is 
about 


(a) 0.995. (b) 0.5. (c) 0.005. 


| CHAPTER 12 EXERCISES 


12.32 


Sample space. In each of the following situations, de- 
scribe a sample space S for the random phenomenon. 
(a) A basketball player shoots four free throws. You 
record the sequence of hits and misses. 

(b) A basket- 
ball player 


ee 

is} 
shoots four 5 
free throws. 
You record cae 

25 
the number s 2 
of baskets BES 
she makes. aa 


12.33 Probability models? In each of the following situ- 


ations, state whether or not the given assignment 
of probabilities to individual outcomes is legitimate, 
that is, satisfies the rules of probability. Remember, 
a legitimate model need not be a practically reason- 
able model. If the assignment of probabilities is not 
legitimate, give specific reasons for your answer. 


(a) Roll a six-sided die, and record the count of 
spots on the up-face: 


P(1) =0 
P44) =18 


P(2) = 1/6 
Jey is) 


P(female full-time) = 0.56 


(b) Deal a card from a shuffled deck: 

P(clubs) = 12/52 P(diamonds) = 12/52 
P(hearts) = 12/52 P(spades) = 16/52 
(c) Choose a college student at random and record 
sex and enrollment status: 


P(male full-time) = 0.44 


P(female part-time) = 0.24 P(male part-time) = 0.17 


12.34 


12.35 


12.36 


Education among young adults. Choose a young 
adult (aged 25 to 29) at random. The probability is 
0.10 that the person chosen did not complete high 
school, 0.27 that the person has a high school di- 
ploma but no further education, and 0.34 that the 
person has at least a bachelor’s degree. 


(a) What must be the probability that a randomly 
chosen young adult has some education beyond 
high school but does not have a bachelor’s degree? 
(b) What is the probability that a randomly chosen 
young adult has at least a high school education? 


Land in Canada. Canada’s national statistics agency, 
Statistics Canada, says that the land area of 
Canada is 9,094,000 square kilometers. Of this land, 
4,176,000 square kilometers are forested. Choose a 
square kilometer of land in Canada at random. 


(a) What is the probability that the area you chose 
is forested? 
(b) What is the probability that it is not forested? 


Foreign-language study. Choose a student in a U.S. 
public high school at random and ask if he or she is 
studying a language other than English. Here is the 
distribution of results: 


Language 


Spanish French German All others None 


Probability 


12.37 


Color 


0.30 0.08 0.02 0.03 0.57 


(a) Explain why this is a legitimate probability 
model. 

(b) What is the probability that a randomly chosen 
student is studying a language other than English? 
(c) What is the probability that a randomly chosen 
student is studying French, German, or Spanish? 


Car colors. Choose a new car or light truck at random 
and note its color. Here are the probabilities of the most 
popular colors for vehicles sold globally in 2013:” 


White Black Silver Gray Red Beige, brown Blue 


Probability 


O75) “Obikss Obits; OL? Oo) 0.08 0.07 
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(a) What is the probability that the vehicle you 
chose has any color other than those listed? 

(b) What is the probability that a randomly chosen 
vehicle is neither white nor silver? 


Drawing cards. You are about to draw a card at ran- 
dom (that is, all choices have the same probability) 


12.39 


12.40 


12.41 
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from a set of seven cards. Although you can’t see the 
cards, here they are: 


seh (OV Vaal ¥ 
ee | VY ann ¥ 
oe | ava 

a 


vr] a atl vv 


a 
3 


v vile oc] @ of 


(a) What is the probability that you draw a 9? 

(b) What is the probability that you draw a red 9? 
(c) What is the probability that you do not draw a7? 
Loaded dice. There are many ways to produce 
crooked dice. To load a die so that 6 comes up too 
often and 1 (which is opposite 6) comes up too sel- 
dom, add a bit of lead to the filling of the spot on 
the 1 face. If a die is loaded so that 6 comes up with 
probability 0.2 and the probabilities of the 2, 3, 4, 
and 5 faces are not affected, what is the assignment 
of probabilities to the six faces? 

A door prize. A party host gives a door prize to one 
guest chosen at random. There are 48 men and 42 
women at the party. What is the probability that the 
prize goes to a woman? Explain how you arrived at 
your answer. 

Race and ethnicity. The U.S. Census Bureau allows 
each person to choose from a long list of races. That is, 
in the eyes of the U.S. Census Bureau, you belong to 
whatever race you say you belong to. “Hispanic/Latino” 
is a separate category; Hispanics may be of any race. If 
we choose a resident of the United States at random, 
the U.S. Census Bureau gives these probabilities: 


HISPANIC NOT HISPANIC 


Asian 0.002 0.051 
Black 0.008 0.126 
White 0.153 0.645 
Other 0.006 0.009 


(a) Verify that this is a legitimate assignment of 
probabilities. 

(b) What is the probability that a randomly chosen 
American is Hispanic? 

(c) Non-Hispanic whites are the historical majority 
in the United States. What is the probability that 

a randomly chosen American is not a member of 
this group? 
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Choose at random a person aged 15 to 44 years. Ask their age and 
who they live with (alone, with spouse, with other persons). Here is 
the probability model for 12 possible answers:* 


AGE IN YEARS 


15-19 20-24 25-34 35-44 its 
Alone 0.001 0.010 0.032 0.028 
With spouse 0.001 0.021 0.151 0.208 
With others (not a spouse) | 0.168 0.139 0.149 0.092 


Exercises 12.42 to 12.44 use this probability model. 
12.42 


12.43 


12.44 


12.45 


12.46 


Living arrangements. 

(a) Why is this a legitimate finite probability model? 
(b) What is the probability that the person chosen 
is a 15- to 19-year-old who lives with others (not a 
spouse)? 

(c) What is the probability that the person is 15 to 
19 years old? 

(d) What is the probability that the person chosen 


lives with others (not a spouse)? 


Living arrangements, continued. ee 


(a) List the outcomes that make up the event 


A = {The person chosen is either 15 to 19 years old 
or lives with others, or both} 

(b) What is P(A)? Explain carefully why P(A) is not 

the sum of the probabilities you found in parts (c) 

and (d) of Exercise 12.42. 


Living arrangements, continued. 


(a) What is the probability that the person chosen 
is 20 years old or older? 

(b) What is the probability that the person chosen 
does not live alone? 


Spelling errors. Spell-checking software catches 
“nonword errors” that result in a string of letters that 
is not a word, as when “the” is typed as “teh.” When 
undergraduates are asked to type a 250-word essay 
(without spell-checking), the number X of nonword 
errors has the following distribution: 


Value of X 0 1 2 3 4 
Probability | 01 02 03 03 O11 


(a) Is the random variable X discrete or continuous? 
Why? 

(b) Write the event “at least one nonword error” in 
terms of X. What is the probability of this event? 

(c) Describe the event X = 2 in words. What is 

its probability? What is the probability that X < 2? 


First digits again. A crook who never heard of Ben- 
ford’s law might choose the first digits of his faked 
invoices so that all of 1, 2, 3, 4, 5, 6, 7, 8, and 9 are 
equally likely. Call the first digit of a randomly cho- 
sen fake invoice W for short. 


12.49 


(a) Write the probability distribution for the 
random variable W. 

(b) Find P(W = 6) and compare your result with 
the Benford’s law probability from Example 12.7. 


Who gets interviewed? Abby, Deborah, Mei-Ling, 
Sam, and Roberto are students in a small seminar 
course. Their professor decides to choose two of 
them to interview about the course. To avoid unfair- 
ness, the choice will be made by drawing two names 
from a hat. (This is an SRS of size 2.) 


(a) Write down all possible choices of two of the 
five names. This is the sample space. 

(b) The random drawing makes all choices equally 
likely. What is the probability of each choice? 

(c) What is the probability that Mei-Ling is chosen? 


(d) Abby, Deborah, and Mei-Ling liked the course. 
Sam and Roberto did not like the course. What is 
the probability that both people selected liked the 
course? 


Birth order. A cou- 
ple plans to have 
three children. There 
are eight possible ar- 
rangements of girls 
and boys. For ex- 
ample, GGB means 
the first two children 
are girls and the third 
child is a boy. All 
eight arrangements ~ 
are (approximately) 
equally likely. 


(a) Write down all 
eight arrangements 
of the sexes of three 
children. What is the probability of any one of these 
arrangements? 

(b) Let X be the number of girls the couple has. 
What is the probability that X = 2? 

(c) Starting from your work in part (a), find the 
distribution of X. That is, what values can X take, 
and what are the probabilities for each value? 


: 
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Unusual dice. Nonstandard dice can produce in- 
teresting distributions of outcomes. You have two 
balanced, six-sided dice. One is a standard die, with 
faces having 1, 2, 3, 4, 5, and 6 spots. The other die 
has three faces with 0 spots and three faces with 
6 spots. Find the probability distribution for the total 
number of spots Y on the up-faces when you roll 
these two dice. (Hint: Start with a picture like 
Figure 12.2 for the possible up-faces. Label the 
three 0 faces on the second die Oa, Ob, Oc in your 
picture, and similarly distinguish the three 6 faces.) 
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A taste test. A tea-drinking Canadian friend of 
yours claims to have a very refined palate. She tells 
you that she can tell if, in preparing a cup of tea, milk 
is first added to the cup and then hot tea poured into 
the cup, or the hot tea is first poured into the cup 
and then the milk is added. To test her claims, you 
prepare six cups of tea. Three have the milk added 
first and the other three the tea first. In a blind taste 
test, your friend tastes all six cups and is asked to 


identify the three that had the milk added first. 


(a) How many different ways are there to select 
three of the six cups? (Hint: See Example 12.8.) 
(b) If your friend is just guessing, what is the prob- 
ability that she correctly identifies the three cups 
with the milk added first? 


Random numbers. Many random number generators 
allow users to specify the range of the random numbers 
to be produced. Suppose you specify that the random 
number Y can take any value between 0 and 2. Then 
the density curve of the outcomes has constant height 
between 0 and 2, and height 0 elsewhere. 


(a) Is the random variable Y discrete or continuous? 
Why? 

(b) What is the height of the density curve between 
O and 2? Draw a graph of the density curve. 

(c) Use your graph from part (b) and the fact that 
probability is area under the curve to find P(Y = 1). 


More random numbers. Find these probabilities 
as areas under the density curve you sketched in 
Exercise 12.51. 


(ay POS =< Y = 13) 

(b) P(Y = 0.8) 

Survey accuracy. A sample survey contacted an 
SRS of 2854 registered voters shortly before the 2012 
presidential election and asked respondents whom 
they planned to vote for. Election results show that 
51% of registered voters voted for Barack Obama. 
We will see later that in this situation the propor- 
tion of the sample who planned to vote for Barack 
Obama (call this proportion V) has approximately 
the Normal distribution with mean p = 0.51 and 
standard deviation o = 0.009. 


(a) If the respondents answer truthfully, what is 
P(0.49 = V = 0.53)? This is the probability that 
the sample proportion V estimates the population 
proportion 0.51 within plus or minus 0.02. 

(b) In fact, 49% of the respondents said they planned 
to vote for Barack Obama (V = 0.49. If respondents 
answer truthfully, what is P(V = 0.49)? 


Friends. How many close friends do you have? Sup- 
pose that the number of close friends adults claim to 
have varies from person to person with mean w = 9 
and standard deviation @ = 2.5. An opinion poll 


12.55 


12.56 


12.57 


12.58 
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asks this question of an SRS of 1100 adults. We will 
see later, in Chapter 19, that in this situation the 
sample mean response x has approximately the Nor- 
mal distribution with mean 9 and standard deviation 
0.075. What is P(8.9 =x <9.1), the probability 
that the sample result x estimates the population 
truth w = 9 to within £0.1? 


Playing Pick 4. The Pick 4 games in many state lot- 
teries announce a four-digit winning number each 
day. Each of the 10,000 possible numbers 0000 to 
9999 has the same chance of winning. You win if 
your choice matches the winning digits. Suppose 
your chosen number is 5974. 


(a) What is the probability that the winning 
number matches your number exactly? 

(b) What is the probability that the winning num- 
ber has the same digits as your number in any order? 


Nickels falling over. You may feel that it is obvi- 
ous that the probability of a head in tossing a coin 
is about 1/2 because the coin has two faces. Such 
opinions are not always correct. Stand a nickel 
on edge on a hard, flat surface. Pound the surface 
with your hand so that the nickel falls over. What 
is the probability that it falls with heads upward? 
Make at least 50 trials to estimate the probability 
of a head. 


What probability doesn’t say. The idea of prob- 
ability is that the proportion of heads in many tosses 
of a balanced coin eventually gets close to 0.5. But 
does the actual count of heads get close to one- 
half the number of tosses? Let’s find out. Set the 
“Probability of heads” in the Probability applet to 0.5 
and the number of tosses to 50. You can extend the 
number of tosses by clicking “Toss” again to get 50 
more. Don’t click “Reset” during this exercise. 


(a) After 50 tosses, what is the proportion of heads? 
What is the count of heads? What is the difference 
between the count of heads and 25 (one-half the 
number of tosses) ? 

(b) Keep going to 150 tosses. Again record the 
proportion and count of heads and the difference 
between the count and 75 (half the number of 
tosses). 

(c) Keep going. Stop at 250 tosses and again at 

500 tosses to record the same facts. Although it may 
take a long time, the laws of probability say that the 
proportion of heads will always get close to 0.5 and 
also that the difference between the count of heads 
and half the number of tosses will always grow 
without limit. 


LeBron’s free throws. The basketball player LeBron 
James makes about three-quarters of his free throws 
over an entire season. Use the Probability applet or 
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statistical software to simulate 100 free throws shot 
by a player who has probability 0.75 of making each 
shot. (In most software, the key phrase to look for is 
“Bernoulli trials.” This is the technical term for inde- 
pendent trials with yes/no outcomes. Our outcomes 
here are “Hit” and “Miss.”) 


(a) What percent of the 100 shots did he hit? 

(b) Examine the sequence of hits and misses. How 
long was the longest run of shots made? Of shots 
missed? (Sequences of random outcomes often 
show runs longer than our intuition thinks likely.) 


Simulating an opinion poll. A 2013 opinion poll 
showed that about 33% of the American public have 
very little or no confidence in big business. Suppose 
that this is exactly true. Choosing a person at ran- 
dom then has probability 0.33 of getting one who has 
very little or no confidence in big business. Use the 
Probability applet or statistical software to simulate 
choosing many people at random. (In most software, 


Exploring the Web 


the key phrase to look for is “Bernoulli trials.” This 
is the technical term for independent trials with Yes/ 
No outcomes. Our outcomes here are “Favorable” or 
“Not Favorable.”) 


(a) Simulate drawing 50 people, then 100 people, 
then 400 people. What proportion have very little 
or no confidence in big business in each case? We 
expect (but because of chance variation we can’t 
be sure) that the proportion will be closer to 0.38 in 
longer runs of trials. 

(b) Simulate drawing 50 people 10 times and record 
the percents in each sample who have very little 

or no confidence in big business. Then simulate 
drawing 400 people 10 times and again record the 
10 percents. Which set of 10 results is less variable? 
We expect the results of samples of size 400 to be 
more predictable (less variable) than the results 

of samples of size 50. That is “long-run regularity” 
showing itself. 


12.60 Super Bowl odds. Oddsmakers often list the odds for certain sporting events 
on the web. For example, one can find the current odds of winning the next Super 
Bowl for each NFL team. We found a list of such odds at www. vegasinsider. 
com/nfl/odds/futures/. When an oddsmaker says the odds are A to B 
of winning, he or she means that the probability of winning is B/(A + B). For 
example, when we checked the website listed, the odds that the Seattle Seahawks 
would win Super Bowl XLIX were 6 to 1. This corresponds to a probability of win- 


ning of 1/(6 + 1) = 1/7. 


On the web, find the current odds, according to an oddsmaker, of winning 
the Super Bowl for each NFL team. Convert these odds to probabilities. Do these 
probabilities satisfy Rules 1 and 2 given in this chapter? If they don’t, can you 


think of a reason why? 


Lonely Planet Images/Getty Images 


robability models can describe the flow of traffic through a highway system, 
a telephone interchange, or a computer processor; the genetic makeup of 
populations; the energy states of subatomic particles; the spread of epidemics 
or rumors; and the rate of return on risky investments. Although we are inter- 
ested in probability mainly because it is the foundation for statistical inference, the 
mathematics of chance is important in many fields of study. Our introduction to 
probability in Chapter 12 concentrated on basic ideas and facts. Now we look at 
some further details. With more probability at our command, we can model more 
complex random phenomena. 
Although we won’t emphasize the math, everything in this chapter (and much 
more) follows from the four rules we met in Chapter 12. Here they are again. 


*This chapter introduces some of the mathematics of probability. The material is not needed to 
understand the rest of the book. 


13.1 


13.2 
13.3 
13.4 


13.5 
13.6 


Independence and the multi- 
plication rule 


The general addition rule 
Conditional probability 


The general multiplication 
rule 


Independence again 


Tree diagrams 
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Venn diagram 


Probability Rules 


Rule 1. For any event A, 0 = P(A) = 1. 
Rule 2. If S is the sample space, P(S) = 1. 
Rule 3. Addition rule for disjoint events: If A and B are disjoint events, 
P(A or B) = P(A) + P(B) 
Rule 4. For any event A, 
P(A does not occur) = 1 — P(A) 


Rule 3, the addition rule for disjoint events, describes the probability that one or 
the other of two events A and B occurs in the special situation when A and B can- 
not occur together. Now we will describe the probability that both events A and B 
occur, again only in a special situation. 

You may find it helpful to draw a picture to display relations among several events. 
A picture like Figure 13.1 that shows the sample space S as a rectangular area and 
events as areas within S is called a Venn diagram. The events A and B in Figure 13.1 
are disjoint because they do not overlap. The Venn diagram in Figure 13.2 illustrates 
two events that are not disjoint. The event {A and B} appears as the overlapping 
area that is common to both A and B. Can we find the probability P(A and B) that 
both events occur if we know the individual probabilities P(A) and P(B)? 


AandB 


FIGURE 13.2 
FIGURE 13.1 Venn diagram showing events A and B that are not disjoint. The 
Venn diagram showing disjoint events A and B. event {A and B} consists of outcomes common to A and B. 


HN _-S 
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That molecule in the diagram is PTC (phenylthiocarbamide), a substance with an 
unusual property: 70% of people find that it has a bitter taste, and the other 30% 
can't taste it at all. The difference is genetic, depending on a single gene. Ask two 
people chosen at random to taste PTC. We are interested in the events 


A = {first person can taste PTC} 
B = {second person can taste PTC} 


We know that P(A) = 0.7 and P(B) = 0.7. What is the probability P(A and B) that 
both can taste PTC? 


13.1. Independence and the Multiplication Rule 


We can think our way to the answer. The first person chosen can taste PTC 
in 70% of all samples and then the second person can taste it in 70% of those 
samples. We will get two tasters in 70% of 70% of all samples. That’s P(A and B) = 
0.7 X 0.7 = 0.49. a 
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The argument in Example 13.1 works because knowing that the first person can 
taste PTC tells us nothing about the second person. The probability is still 0.7 that 
the second person can taste PTC whether or not the first person can. We say that 
the events “first person can taste PTC” and “second person can taste PTC” are 
independent events. Now we have another rule of probability. 


Multiplication Rule for Independent Events 


Two events A and B are independent if knowing that one occurs does not change 
the probability that the other occurs. If A and B are independent, 


P(A and B) = P(A)P(B) 


To use this multiplication rule, we must decide whether events are independent. 
Here are some examples to help you recognize when you can assume events are 
independent. 

In Example 13.1, we think that the ability of one randomly chosen person to 
taste PTC tells us nothing about whether or not a second person, also randomly 
chosen, can taste PTC. That’s independence. But if the two people are members of 
the same family, the fact that ability to taste PTC is inherited warns us that they are 
not independent. 

Independence is clearly recognized in artificial settings such as games of chance. 
Because a coin has no memory and most coin tossers cannot influence the fall of the 
coin, it is safe to assume that successive coin tosses are independent, so that the 
probability of three heads in succession is 0.5 X 0.5 X 0.5 = 0.125. 

On the other hand, the colors of successive cards dealt from the same deck are not 
independent. A standard 52-card deck contains 26 red and 26 black cards. For the 
first card dealt from a shuffled deck, the probability of a red card is 26/52 = 0.50. 
Once we see that the first card is red, we know that there are only 25 reds among 
the remaining 51 cards. The probability that the second card is red is therefore only 
25/51 = 0.49. Knowing the outcome of the first deal changes the probabilities for 
the second. m 


The multiplication rule extends to collections of more than two events, pro- 
vided that all are independent. Independence of events A, B, and C means that no 
information about any one or any two can change the probability of the remaining 
events. Independence is often assumed in setting up a probability model when the 
events we are describing seem to have no connection. 

If two events A and B are independent, the event that A does not occur is also 
independent of B, and so on. For example, choose two people at random and ask if 
they can taste PTC. Because 70% can taste PTC and 30% cannot, the probability 
that the first person is a taster and the second is not is (0.7)(0.3) = 0.21. 


independent events 


Ey CONDEMNED BY 
Md 


INDEPENDENCE 


Assuming independence when it 

isn’t true can lead to disaster. Several 
mothers in England were convicted 
of murder simply because two of their 
children had died in their cribs with 
no visible cause. An “expert witness” 
for the prosecution said that the 
probability of an unexplained crib 
death in a nonsmoking middle-class 
family is 1/8500. He then multiplied 
1/8500 by 1/8500 to claim that there 
is only a | in 73 million chance that 
two children in the same family could 
have died naturally. This is nonsense: 
it assumes that crib deaths are 
independent, and data suggest that 
they are not. Some common genetic 
or environmental cause, not murder, 
probably explains the deaths. 
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During World War Il, the British found that the probability of a bomber’s 
being lost through enemy action on a mission over occupied Europe 
was 0.05. The probability that the bomber returned safely from a 
mission was therefore 0.95. It is reasonable to assume that missions 
were independent. Take A; to be the event that a bomber survived its 
ith mission. The probability of surviving two missions is 
P(A, and A;) = P(A,)P(A;) 

= (0.95)(0.95) = 0.9025 
The multiplication rule also applies to more than two independent 
events, so the probability of surviving three missions is 
© CORBIS = (0.95)(0.95)(0.95) = 0.8574 


In 1941, the tour of duty for an airman was established as 30 missions. The prob- 
ability of surviving 30 missions is only 
P(A, and Az and... and Azo) = P(A;)P(Az) --* P(Aso) 
= (0.95)(0.95)--:(0.95) 
= (0.95) = 0.2146 


The probability of surviving two tours of duty was much smaller. m 


Here is another example of using the multiplication rule for independent events 
to compute probabilities. 


AIDS, don’t come back to learn the test results. Clinics now use “rapid HIV tests” that 
give a result while the client waits. In a clinic in Malawi, for example, use of rapid tests 
increased the percent of clients who learned their test results from 69% to 99.7%. 

The trade-off for fast results is that rapid tests are less accurate than slower 
laboratory tests. Applied to people who have no HIV antibodies, one rapid test has 
probability about 0.004 of producing a false positive (that is, of falsely indicating that 
antibodies are present).' If a clinic tests 200 people who are free of HIV antibodies, 
what is the chance that at least one false positive will occur? 


42 (STATE? Many people who come to clinics to be tested for HIV, the virus that causes 


PLAN: It is reasonable to assume that the test results for different individuals are 
independent. We have 200 independent events, each with probability 0.004. What is 
the probability that at least one of these events occurs? 


ESA “At least one” combines many outcomes. It is much easier to use Rule 4 
(page 304) which says that 
P(at least one positive) = 1 — P(no positives) 


and find P(no positives) first. 

The probability of a negative result for any one person is 1 — 0.004 = 0.996. 
To find the probability that all 200 people tested have negative results, use the 
multiplication rule: 

P(no positives) = P(all 200 negative) 
= (0.996)(0.996) --- (0.996) 
= 0.996°° = 0.4486 
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The probability we want is therefore 


P(at least one positive) = 1 — 0.4486 = 0.5514 


[SOMSEUDA The probability is greater than 1/2 that at least one of the 200 people 
will test positive for HIV, even though no one has the virus. m 


pendent but not otherwise. The addition rule P(A or B) = P(A) + P(B) holds 

if A and B ave disjoint but not otherwise. Resist the temptation to use these 

simple rules when the circumstances that justify them are not present. You 
A must also be careful not to confuse disjointness and independence. If A and B 

are disjoint, then the fact that A occurs tells us that B cannot occur—look 
again at Figure 13.1. So disjoint events are not independent. Unlike disjointness, we 
cannot depict independence in a Venn diagram because it involves the probabilities 
of the events rather than just the outcomes that make up the events. 


A The multiplication rule P(A and B) = P(A)P(B) holds if A and B are inde- 


() LaunchPad Online Resources 


* The StatBoards video, Independence and the Multiplication Rule, discusses 
independence and provides an example of using the multiplication rule for 
independent events. 


Apply Your Knowledge 


13.1 Older College Students. Government data show that 8% of adults are 
full-time college students and that 30% of adults are aged 55 or older. None- 
theless, we can’t conclude that because (0.08)(0.30) = 0.024, about 2.4% of 
adults are college students 55 or older. Why not? 


13.2 Common Names. The U.S. Census Bureau says that the 10 most common 
names in the United States are (in order) Smith, Johnson, Williams, Brown, 
Jones, Miller, Davis, Garcia, Rodriguez, and Wilson. These names account 
for 9.6% of all U.S. residents. Out of curiosity, you look at the authors of the 
textbooks for your current courses. There are 9 authors in all. Would you 
be surprised if none of the names of these authors were among the 10 most 
common? (Assume that authors’ names are independent and follow the same 
probability distribution as the names of all residents.) 


13.3 Lost Internet Sites. Internet sites often vanish or move, so that references 
to them can’t be followed. In fact, 13% of Internet sites referenced in major 
scientific journals are lost within two years after publication.’ If a paper con- 
tains seven Internet references, what is the probability that all seven are still 
good two years later? What specific assumptions did you make to calculate 
this probability? 


We know that if A and B are disjoint events, then P(A or B) = P(A) + P(B). If 
events A and B are not disjoint, they can occur together. The probability that one 
or the other occurs is then less than the sum of their probabilities. As Figure 13.3 
illustrates, outcomes common to both are counted twice when we add probabilities, 
so we must subtract this probability once. Here is the addition rule for any two 
events, disjoint or not. 
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FIGURE 13.3 

The general addition rule: for any 
events A and B, P(A or B) = P(A) + 
P(B) — P(Aand B). 


Outcomes here are double- 
counted by P(A) + P(B). 


Addition Rule for Any Two Events 


For any two events A and B, 


P(A or B) = P(A) + P(B) — P(A and B) 


If A and B are disjoint, the event {A and B} that both occur contains no out- 
comes and therefore has probability 0. Because the general addition rule includes 
Rule 3, the addition rule for disjoint events, we can always use the general addition 
tule to find {A or B}. 


Motor vehicles sold in the United States (ignoring heavy trucks) are classified as 
either cars or light trucks and as either domestic or imported. “Light trucks” include 
SUVs and minivans. “Domestic” means made in Canada, Mexico, or the United States, 
so that a Toyota made in Canada counts as domestic. 

In 2012, 78.3% of the new vehicles sold to individuals were domestic, 49.5% 
were light trucks, and 42.1% were domestic light trucks.? Choose a vehicle sale at 
random. Then 


P(domestic or light truck) = P(domestic) + P(light truck) — P(domestic light truck) 
= 0.783 + 0.495 — 0.421 = 0.857 


That is, 85.7% of vehicles sold were either domestic or light trucks (or both). A 
vehicle is an imported car if it is neither domestic nor a light truck. So 


P(imported car) = 1 — 0.857 = 0.143 m 


Venn diagrams clarify events and their probabilities because you can just think of 
adding and subtracting areas. Figure 13.4 shows all the events formed from “domestic” 


Neither D nor T 
0.143 


T and not D 
0.074 


D and not T 
0.362 


FIGURE 13.4 
Venn diagram and probabilities for mo- 
tor vehicle sales, for Example 13.5. D =vehicle is domestic T = vehicle is a light truck 
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and “truck” in Example 13.5. The four probabilities that appear in the figure add to 
1 because they refer to four disjoint events that make up the entire sample space. 
All of these probabilities come from the information in Example 13.5. For example, 
the probability that a randomly chosen vehicle sale is a domestic car (“D and not 
T” in the figure) is 
P(domestic car) = P(domestic) — P(domestic light truck) 
= 0.783 — 0.421 = 0.362 


() LaunchPad Online Resources 


° The StatClips Examples videos, Probability: Basic Rules Example A and 
Probability: Basic Rules Example C, both illustrate Venn diagrams and the 
general addition rule. 


Apply Your Knowledge 


13.4 Photo and Video Sharing. Photos and videos have become an important 
part of the online social experience, with more than half of Internet users post- 
ing photos or videos online that they have taken themselves. Let A be the event 
an Internet user posts photos that they have taken themselves, and B be the 
event an Internet user posts videos that they have taken themselves. Pew Re- 
search Center finds that P(A) = 0.52, P(B) = 0.26, and P(A or B) = 0.54.4 


(a) Make a Venn diagram similar to Figure 13.4 showing the events 
{A and B}, {A and not B}, {B and not A}, and {neither A nor B}. 


(b) Describe each of these events in words. 
(c) Find the probabilities of all four events, and add the probabilities to 
your Venn diagram. 


13.5 College Degrees. Of all college degrees awarded in the United States, 
50% are bachelor’s degrees, 59% are earned by women, and 29% are bach- 
elor’s degrees earned by women. Make a Venn diagram and use it to answer 
these questions. 

(a) What percent of all degrees are earned by men? 


(b) What percent of all degrees are bachelor’s degrees earned by men? 


(c) Are the events “earning a bachelor’s degree” and “being a man” 
independent? Why? Barry Austin Photography/Getty Images 


The probability we assign to an event can change if we know that some other 
event has occurred. This idea is the key to many applications of probability. 


Figure 13.4, based on the information in Example 13.5, gives the following prob- 
abilities for a randomly chosen light motor vehicle sold at retail in the United States: 


Domestic Imported Total 
Light truck 0.421 0.074 0.495 
Car 0.362 0.143 0.505 
Total 0.783 0.217 1 
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The four probabilities in the body of the table add to 1 because they describe all vehi- 
cles sold. We obtain the “Total” row and column from these probabilities by the addition 
rule. For example, the probability that a randomly chosen vehicle is a light truck is 


P(truck) = P(truck and domestic) + P(truck and imported) 
= 0.421 + 0.074 = 0.495 
Suppose we are told that the vehicle chosen is imported. That is, it is one of the 21.7% 


in the “Imported” column of the table. The probability that a vehicle is a light truck, given 
the information that it is imported, is the proportion of trucks in the “Imported” column, 


0.074 
P(truck | i rted) = = 0.341 
(truck | imported) ooi7 
conditional probability This is a conditional probability. You can read the bar | as “given the information that.” m 


Although 49.5% of all vehicles sold are trucks, only 34.1% of imported vehicles 
are trucks. It’s common sense that knowing that one event (the vehicle is imported) 
occurs often changes the probability of another event (the vehicle is a truck). The 
example also shows how we should define conditional probability. The idea of a 
conditional probability P(B | A) of one event B given that another event A occurs 
is the proportion of all occurrences of A for which B also occurs. 


Conditional Probability 


When P(A) > 0, the conditional probability of B given A is 
P(A and B) 


The conditional probability P(B | A) makes no sense if the event A can 

never occur, so we require that P(A) > 0 whenever we talk about P(B | A). 

Be sure to keep in mind the distinct roles of the events A and B in P(B|A). 

Event A represents the information we are given, and B is the event whose probabil- 
ity we are calculating. Here is an example that emphasizes this distinction. 


What is the conditional probability that a randomly chosen vehicle is imported, given 
the information that it is a truck? Using the definition of conditional probability, 


P(imported and truck) 
P(truck) 


P(imported | truck) = 


Only 14.9% of trucks sold are imports. m 


Be careful not to confuse the two different conditional probabilities 


P(truck | imported) = 0.341 
P(imported | truck) = 0.149 


The first answers the question, “What proportion of imports are trucks?” The sec- 
q prop p 
ond answers, “What proportion of trucks are imports?” 
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* The StatBoards video, Conditional Probabilities, illustrates the computation 
of conditional probabilities. 


Apply Your Knowledge 


13.6 Photo and Video Sharing. In the setting of Exercise 13.4, what is the con- 
ditional probability that an Internet user posts photos that they have taken 
themselves, given that they post videos that they have taken themselves? 
(Hint: First, organize the four probabilities in Exercise 13.4 in a table as in 
Example 13.6.) 


13.7 College Degrees. In the setting of Exercise 13.5, what is the conditional 
probability that a degree is earned by a woman, given that it is a bachelor’s 


degree? 
13.8 Computer Games. Here is the distribution of computer games sold by type 
of game: 
Game Type Probability 
Strategy 0.354 
Role playing 0.139 
Family entertainment O27 
Shooters 0.109 
Children’s 0.057 
Other 0.214 


What is the conditional probability that a computer game is a role-playing 
game, given that it is not a strategy game? 


The definition of conditional probability reminds us that in principle all prob- 
abilities, including conditional probabilities, can be found from the assignment of 
probabilities to events that describes a random phenomenon. More often, however, 
conditional probabilities are part of the information given to us in a probability 
model. The definition of conditional probability then turns into a rule for finding 
the probability that both of two events occur. 


Multiplication Rule for Any Two Events 


The probability that both of two events A and B happen together can be found by 
P(A and B) = P(A)P(B| A) 


Here P(B | A) is the conditional probability that B occurs, given the information that 
A occurs. 


In words, this rule says that for both of two events to occur, first one must occur 
and then, given that the first event has occurred, the second must occur. This is just 
common sense expressed in the language of probability, as the following example 
illustrates. 


Py WINNING THE 
ate 


LOTTERY TWICE 


In 1986, Evelyn Marie Adams won 
the New Jersey lottery for the second 
time, adding $1.5 million to her 
previous $3.9 million jackpot. The 
New York Times claimed that the odds 
of one person winning the big prize 
twice were 1 in 17 trillion. Nonsense, 
said two statisticians in a letter to the 
Times. The chance that Evelyn Marie 
Adams would win twice is indeed 
tiny, but it is almost certain that 
someone among the millions of lottery 
players would win two jackpots. 

Sure enough, Robert Humphries 
won his second Pennsylvania lottery 
jackpot ($6.8 million total) in 1988, 
and more recently Ernest Pullen of 
St. Louis won the Missouri lottery 

in June of 2010 and then again in 
September of 2010 ($3 million total). 
When commenting on the double 
win, Ernest Pullen said he considers 
himself to be a “lucky guy.” 
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The Pew Internet and American Life Project finds that 95% of teenagers (aged 12 to 
17) use the Internet, and that 81% of online teens use some kind of social media.® 
What percent of teens are online and use some kind of social media? 

Use the multiplication rule: 


P(online) = 0.95 
P(use social media | online) = 0.81 
P(online and use social media) = P(online) X P(use social media | online) 
= (0.95)(0.81) = 0.7695 
That is, about 77% of all teens are online and use some kind of social media. 
You should think your way through this: if 95% of teens are online and 81% of 
these use some kind of social media, then 81% of 95% are both online and use 
some kind of social media. 


It is important to remember that the conditional probability of an event generally 
depends on the event we condition on. Although we have shown 


P(use social media | online) = 0.81, 
since someone who is not online cannot be using social media, we have 


P(use social media | not online) = O. m 


We can extend the multiplication rule to find the probability that all of several 
events occur. The key is to condition each event on the occurrence of all of the 
preceding events. So for any three events A, B, and C, 


P(A and B and C) = P(A)P(B | A)P(C | both A and B) 


Here is an example of the extended multiplication rule. 


(STATE? A charity raises funds by calling a list of prospective donors to ask for pledges. 
It is able to talk with 40% of the names on its list. Of those the charity reaches, 30% 
make a pledge. But only half of those who pledge actually make a contribution. What 
percent of the donor list contributes? 


PLAN: Express the information we are given in terms of events and their probabilities: 


If A = {the charity reaches a prospect} then P(A)=04 
If B = {the prospect makes a pledge} then P(B|A) = 0.3 
If C = {the prospect makes a contribution} then P(C|bothAandB) = 0.5 


We want to find P(A and B and C). 
ESQRVEA Use the multiplication rule: 
P(A and Band C) = P(A)P(B|A)P(C| both A and B) 
= 04 X 0.3 X 0.5 = 0.06 


[Fea SEA Only 6% of the prospective donors make a contribution. m 


As Example 13.9 illustrates, formulating a problem in the language of probability 
is often the key to success in applying probability ideas. 


13.5 Independence Again 31 3 


Apply Your Knowledge 


13.9 Atthe Gym. Suppose that 10% of adults belong to health clubs, and 40% of 
these health club members go to the club at least twice a week. What percent 
of all adults go to a health club at least twice a week? Write the information 
given in terms of probabilities and use the general multiplication rule. 


13.10 Teens, Use of Social Media. We saw in Example 13.8 that 95% of teenag- 
ers are online, and that 81% of online teens use some kind of social media. Of 42 
online teens who use some kind of social media, 91% have posted a photo of 
themselves. What percent of all teens are online, use social media, and have 
posted a photo of themselves? Define events and probabilities, and follow the 
pattern of Example 13.9. 


13.11 The Probability of a Flush. A poker player holds a flush when all five cards 
in the hand belong to the same suit (clubs, diamonds, hearts, or spades). We 
will find the probability of a flush when five cards are drawn in succession 
from the top of the deck. Remember that a deck contains 52 cards, 13 of each 
suit, and that when the deck is well shuffled, each card drawn is equally likely 
to be any of those that remain in the deck. 


The Photo Works 


(a) Concentrate on spades. What is the probability that the first card 
drawn is a spade? What is the conditional probability that the second 
card drawn is a spade, given that the first is a spade? (Hint: How many 
cards remain? How many of these are spades?) 

(b) Continue to count the remaining cards to find the conditional 
probabilities of a spade for the third, the fourth, and the fifth card 
drawn, given in each case that all previous cards are spades. 

(c) The probability of drawing five spades in succession from the top of 
the deck is the product of the five probabilities you have found. Why? 
What is this probability? 

(d) The probability of drawing five hearts or five diamonds or five clubs is 
the same as the probability of drawing five spades. What is the probability 
that the five cards drawn all belong to the same suit? 


The conditional probability P(B|A) is generally not equal to the unconditional 
probability P(B). That’s because the occurrence of event A generally gives us some 
additional information about whether or not event B occurs. If knowing that A occurs 
gives no additional information about B, then A and B are independent events. The 
precise definition of independence is expressed in terms of conditional probability. 


Independent Events 


Two events A and B that both have positive probability are independent if 
P(B| A) = P(B) 


We now see that the multiplication rule for independent events, P(A and 
B) = P(A)P(B), is a special case of the general multiplication rule, P(A and B) = 
P(A)P(B|A), just as the addition rule for disjoint events is a special case of the 
general addition rule. We rarely use the definition of independence because most 
often independence is part of the information given to us in a probability model. 
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Apply Your Knowledge 


13.12 Independent? The Clemson University Fact Book and Data Center 
for 2013 shows that 101 of the university’s 253 assistant professors were 
women, along with 97 of the 301 associate professors and 71 of the 350 full 
professors. 


(a) What is the probability that a randomly chosen Clemson professor (of 
any rank) is a woman? 

(b) What is the conditional probability that a randomly chosen professor is 
a woman, given that the person chosen is a full professor? 

(c) Are the rank and sex of Clemson professors independent? How do 
you know? 


Probability models often have several stages, with probabilities at each stage 
conditional on the outcomes of earlier stages. These models require us to com- 
bine several of the basic rules into a more elaborate calculation. Here is an 
example. 


tree diagram 


(STATE? The number of adult Internet users who take photos or videos they have 
found online and post them on sites designed for sharing images with many people 
continues to increase. Looking only at adult Internet users, aged 18 and over, about 
15% are 18 to 29 years old, another 26% are 30 to 49 years old, and the remaining 
59% are 50 and over. The Pew Internet and American Life Project finds that 68% of 
Internet users aged 18 to 29 have posted photos or videos they have found online, 
along with 54% of those aged 30 to 49 and 26% of those 50 or older.” What per- 
cent of all adult Internet users post photos or videos that they have found online? 


PLAN: To use the tools of probability, restate all these percents as probabilities. If 
we choose an adult Internet user at random, 
P(aged 18 to 29) = 0.15 
P(aged 30 to 49) = 0.26 
P(aged 50 and older) = 0.59 
These three probabilities add to 1 because all adult Internet users are in one of the 
three age groups. The percents of each group who post photos or videos they have 
found online are conditional probabilities: 
P(post yes | aged 18 to 29) = 0.68 
P(post yes | aged 30 to 49) = 0.54 
P(post yes | aged 50 and older) = 0.26 


We want to find the unconditional probability P(post yes). 


EES The tree diagram in Figure 13.5 organizes this information. Each segment 
in the tree is one stage of the problem. Each complete branch shows a path through 
the two stages. The probability written on each segment is the conditional probability 
of an Internet user following that segment, given that he or she has reached the node 
from which it branches. 
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FIGURE 13.5 
Probability Tree diagram for who shares images and 
videos online, for Example 13.10. The 
e 
ee Be aes three disjoint paths to the outcome that 
18to29 e 0.32 an adult Internet user posts photos or 
ac Meade : Bene videos that they have found online are 
e Post no i 
015 colored red. 
0.54 @ Post yes 0.1404 
Adult 026 ee 


Internet # ————————- 30 to 49 e 0.46 


users a 
e Post no 0.1196 
0.59 
0.26 © Post yes 0.1534 
a . Se 


e Post no 0.4366 


Starting at the left, an Internet user falls into one of the three age groups. The 
probabilities of these groups mark the leftmost segments in the tree. Look at age 
18 to 29, the top branch. The two segments going out from the “18 to 29” branch 
point carry the conditional probabilities 


P(post yes | aged 18 to 29) = 0.68 
P(post no | aged 18 to 29) = 0.32 


The full tree shows the probabilities for all three age groups. 
Now use the multiplication rule. The probability that a randomly chosen Internet 
user is an 18- to 29-year-old who posts photos or images they have found online is 


P(aged 18 to 29 and post yes) = P(aged 18 to 29)P(post yes | aged 18 to 29) 
= (0.15)(0.68) = 0.1020 


This probability appears at the end of the topmost branch. The multiplication rule 
says that the probability of any complete branch in the tree is the product of the 
probabilities of the segments in that branch. 

There are three disjoint paths to “post yes,” one for each of the three age groups. 
These paths are colored red in Figure 13.5. Because the three paths are disjoint, the 
probability that an adult Internet user posts photos or videos they have found online 
is the sum of their probabilities: 


P(post yes) = (0.15)(0.68) + (0.26)(0.54) + (0.59)(0.26) 
= 0.1020 + 0.1404 + 0.1534 = 0.3958 


TEER Almost 40% of all adult Internet users have posted photos or videos 
they have found online. m 


It takes longer to explain a tree diagram than it does to use it. Once you have 
understood a problem well enough to draw the tree, the rest is easy. Here is another 
question about posting photos or videos online that the tree diagram helps us 
answer. 
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(STATE? What percent of adult Internet users who post photos or videos they have 
found online are aged 18 to 29? 


PLAN: In probability language, we want the conditional probability P(aged 18 to 
29 | post yes). Use the tree diagram and the definition of conditional probability: 


P(aged 18 to 29 and post yes) 
P(post yes) 


ESXQSRYEA Look again at the tree diagram in Figure 13.5. P(post yes) is the sum of 
the three red probabilities, as in Example 13.10. P(aged 18 to 29 and post yes) is 
the result of following just the top branch in the tree diagram. So 


P(aged 18 to 29| post yes) = 


P(aged 18 to 29 and post yes) 
P(post yes) 


P(aged 18 to 29 | post yes) = 


_ 0.1020 
0.3958 


TESTER About 26% of adults who post photos or videos that they have found 
online are between 18 and 29 years old. Compare this conditional probability with 
the original information (unconditional) that 15% of adult Internet users are between 
18 and 29 years old. Knowing that a person posts photos or videos found online 
increases the probability that he or she is young. m 


= 0.2577 


Examples 13.10 and 13.11 illustrate a common setting for tree diagrams. Some 
outcome (such as posting photos or videos found online ) has several sources (such 
as the three age groups). Starting from 


= the probability of each source, and 
= the conditional probability of the outcome given each source 


the tree diagram leads to the overall probability of the outcome. Example 13.10 does 
this. You can then use the probability of the outcome and the definition of con- 
ditional probability to find the conditional probability of one of the sources, given 
that the outcome occurred. Example 13.11 shows how. 


(9) LaunchPad Online Resources 


* The StatBoards video, General Mutiplication Rule and Tree Diagrams, 
provides an example of creating a tree diagram to visually show how to 
organize a set of conditional probabilities and then use the general 
multiplication rule to compute probabilities along the paths of the tree. 


Apply Your Knowledge 


13.13 PSA Screening for Prostate Cancer. The prostate-specific antigen 
(PSA) test is a simple blood test to screen for prostate cancer. It has been 
used in men over 50 as a routine part of a physical exam, with levels above 
4 ng/mL indicating possible prostate cancer. The test result is not always cor- 
rect, sometimes indicating prostate cancer when it is not present and often 
missing prostate cancer that is present. Here are the approximate conditional 
probabilities of a positive (above 4 ng/mL) and negative test result given can- 
cer is present or cancer is absent:® 


13.6 Tree Diagrams Bo1i7 


Test Result 
Positive Negative 


Cancer present 0.21 0.79 
Cancer absent 0.06 0.94 


In a large study of prostate cancer screening, it was found that about 6.3% of 
the population has prostate cancer. 


(a) Draw a tree diagram for selecting a person from this population (out- 
comes: cancer present or absent) and testing his blood (outcomes: test 
positive or negative). 

(b) What is the probability that the test is positive for a randomly chosen 
person from this population? 


13.14 Eye Color, Hair Color, and Freckles. A large study of children of Caucasian 
descent in Germany looked at the effect of eye color, hair color, and freckles on 
the reported extent of burning from sun exposure.’ The population’s distribution 
of hair color, eye color, and freckles was as shown in the tree diagram of Figure 
13.6. Find the following probabilities and describe them in plain English: 


(a) P(blue eyes | red hair) and P (blue eyes and red hair). 
(b) P(freckles | red hair and blue eyes) and P(freckles and red hair and 


blue eyes). 
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ity POLITICALLY 
i) CORRECT 


In 1950, the Soviet mathematician 
B. V. Gnedenko (1912-1995) wrote 
The Theory of Probability, a text that 
was popular around the world. The 
introduction contains a mystifying 
paragraph that begins, “We note 
that the entire development of 
probability theory shows evidence 
of how its concepts and ideas were 
crystallized in a severe struggle 
between materialistic and idealistic 
conceptions.” It turns out that 


“materialistic” is jargon for “Marxist- 


Leninist.” It was good for the health 
of Soviet scientists in the Stalin 
era to add such statements to their 


books. 


13.15 False PSA Positives. Continue your work from Exercise 13.13. The 
probabilities given in the table of Exercise 13.13 are properties of the PSA 
test. The false-positive rate is a property of the PSA test as applied to a 
population. 


(a) Suppose that 6.3% of the population has prostate cancer. What is the 
probability that a person does not have cancer, given that the PSA test 
is positive? This is the false-positive rate. 

(b) Treatment for prostate cancer can have serious side effects, including 
incontinence and impotence, and many men diagnosed with prostate 
cancer would eventually die of another cause if the cancer were left 
untreated. In fact, it has been found in some large clinical trials that 
screening did not reduce overall mortality.!° In October 2011, the 
U.S. Preventive Services Task Force (USPSTF) released a draft report 
in which it recommended against using the PSA test to screen for 
prostate cancer in the general population. Based on the facts given 
about prostate cancer and the calculations that you have done, explain 
simply the reasoning behind the USPSTF recommendation. 

13.16 Eye Color, Hair Color, and Freckles, Continued. Continue your work 
from Exercise 13.14 using the tree diagram of Figure 13.6. Find the following 
probabilities and describe them in plain English: 


(a) P(red hair), P(blue eyes), and P (freckles). 
(b) P(freckles and red hair) and P(freckles | red hair). 


(c) What can you say about the events “freckles” and “red hair” in this 
population? 


CHAPTER 13 SUMMARY 


Chapter Specifics 


m Events A and B are disjoint if they have no outcomes in common. In that case, 
P(A or B) = P(A) + P(B). 


mu The conditional probability P(B | A) of an event B given an event A is defined by 
P(A and B) 
P(A) 


when P(A) > 0. In practice, we most often find conditional probabilities from directly 
available information rather than from the definition. 


P(B|A) = 


m Events A and B are independent if knowing that one event occurs does not change the 
probability we would assign to the other event; that is, P(B | A) = P(B). In that case, 
P(A and B) = P(A)P(B). 


m Any assignment of probability obeys these rules: 


Addition rule for disjoint events: If events A, B, C, .. . are all disjoint in pairs, 
then 


P(A or Bor Cor...) = P(A) + P(B) + P(C) +-: 


Multiplication rule for independent events: If events A, B, C, . . . are indepen- 
dent, then 


P(all of these events occur) = P(A)P(B)P(C)-:- 


General addition rule: For any two events A and B, 
P(A or B) = P(A) + P(B) — P(A and B) 
General multiplication rule: For any two events A and B, 
P(A and B) = P(A)P(B|A) 


mu Tree diagrams organize probability models that have several stages. 


Link It 


Probability models provide the important connection between the observed data and the 
process that generated the data. Probability is also the foundation of statistical inference 
and provides the language by which we answer questions and draw conclusions from our 
data. For these reasons, it is important that we have at least a basic understanding of what 
we mean by probability and some of the rules and properties of probabilities. 

Chapter 12 introduced basic ideas and facts about probability and in this chapter we 
have considered some further details. The conditional distributions discussed in Chapter 
6 are an example of the computation of conditional probabilities. Conditional probabili- 
ties play a central role when studying the relationship between two categorical variables 
and will be revisited in Chapter 24. In terms of later chapters, the most important idea 
of this chapter is independence. This is an assumption that many statistical procedures 
make about the observations in a data set, and it is an important assumption to fully 
understand. 


(9) LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


* StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 
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| CHECK YOUR SKILLS 


13.17 


13.18 


13.19 


An instant lottery game gives you probability 0.02 of 
winning on any one play. Plays are independent of 
each other. If you play three times, the probability 
that you win on none of your plays is about 


(a) 0.98. (b) 0.94. (c) 0.000008. 


The probability that you win on one or more of 
your three plays of the game in Exercise 13.17 is 
about 


(a) 0.02. (b) 0.06. (c) 0.999992. 


An athlete suspected of having used steroids is given 
two tests that operate independently of each other. 
Test A has probability 0.9 of being positive if steroids 
have been used. Test B has probability 0.8 of being 
positive if steroids have been used. What is the prob- 
ability that at least one test is positive if steroids have 
been used? 


(a) 0.98 (b) 0.72 (c) 0.28 


What is the distribution of doctorates conferred by field and sex? 
Here are the counts from the most popular fields in 2012."' The 
physical sciences include mathematics and computer and informa- 
tion sciences; the life sciences include agricultural sciences/natural 
resources, biological/biomedical sciences, and health sciences; and 
the social sciences include psychology. 


MALE FEMALE | TOTAL 
Engineering 6,527 1,883 8,410 
Physical sciences 6,393 DAS pill 8,944 
Life sciences Sos 6,698 | 12,029 
Social sciences 3,488 4,861 8,349 
Education 1,501 3,297 4,798 
Humanities 2,654 2,847 SO 
Other 1,496 1,425 2,921 
Total 27,390 23,562 | 50,592 


13.20 


13.21 


13.22 


13.23 


13.24 


13.25 


13.26 


Exercises 13.20 to 13.23 are based on this table. 


Choose a doctoral recipient at random from this group. 
The probability that the recipient is male is about 


(a) 0.46. (b) 0.54. (c) 0.56. 


The conditional probability that the recipient is 
male, given that the degree is in education, is about 


(a) 0.28. (b) 0.31. (c) 0.46. 


The conditional probability that the degree is in edu- 
cation, given that the recipient is female, is about 


(a) 0.05. (b) 0.14. (c) 0.18. 


Let A be the event that the degree is in engineering 
and B the event that the recipient is a female. The 
proportion of engineering doctorates conferred on 
females is expressed in probability notation as 

(a) P(A and B). (b) P(A|B). (c) P(B| A). 
Choose an American adult at random. The probabil- 
ity that you choose a woman is 0.52. The probability 
that the person you choose has never married is 0.25. 
The probability that you choose a woman who has 
never married is 0.11. The probability that the per- 
son you choose is either a woman or never married 
(or both) is therefore about 


(a) 0.77. (b) 0.66. (c) 0.13. 


Of people who died in the United States in recent 
years, 86% were white, 12% were black, and 2% 
were Asian. (This ignores a small number of deaths 
among other races.) Diabetes caused 2.8% of deaths 
among whites, 4.4% among blacks, and 3.5% among 
Asians. The probability that a randomly chosen 
death is a white person who died of diabetes is about 


(a) 0.107. (b) 0.030. (c) 0.024. 


Using the information in Exercise 13.25, the prob- 
ability that a randomly chosen death was due to dia- 
betes is about 


(a) 0.107. (b) 0.030. (c) 0.024. 


|CHAPTER 13 EXERCISES 


13.27 


13.28 


Playing the lottery. New York State’s “Quick Draw” 
lottery moves right along. Players choose between 
one and 10 numbers from the range 1 to 80; 20 win- 
ning numbers are displayed on a screen every four 
minutes. If you choose just one number, your prob- 
ability of winning is 20/80, or 0.25. Lester plays one 
number eight times as he sits in a bar. What is the 
probability that all eight bets lose? 


Universal blood donors. People with type O-negative 
blood are referred to as universal donors, although 
if you give type O-negative blood to any patient you 


13.29 


run the risk of a transfusion reaction due to cer- 
tain antibodies present in the blood. However, any 
patient can receive a transfusion of O-negative red 
blood cells. Only 7.2% of the American population 
have O-negative blood. If 10 people appear at ran- 
dom to give blood, what is the probability that at 
least one of them is a universal donor? 


Playing the slots. Slot machines are now video 
games, with outcomes determined by random num- 
ber generators. In the old days, slot machines were 
like this: you pull the lever to spin three wheels; each 


13.30 


13.31 


13.32 


wheel has 20 
symbols, all 
equally likely 


to show when 
the wheel stops 
spinning; the 
three wheels are 
independent of 
each other. Sup- 
pose that the middle wheel has nine cherries among 
its 20 symbols, and the left and right wheels have one 
cherry each. 


(a) You win the jackpot if all three wheels show cher- 
ries. What is the probability of winning the jackpot? 
(b) There are three ways that the three wheels can 
show two cherries and one symbol other than a 
cherry. Find the probability of each of these ways. 
(c) What is the probability that the wheels stop 
with exactly two cherries showing among them? 


A whale of a time. Hacksaw’s Boats of St. Lucia 
takes tourists on a daily dolphin/whale watch 
cruise. Its bro- 
chure claims an 
85% chance of 
sighting a whale 
or a dolphin, and 
you can assume 
that sightings 
from day to day 
are independent. 


(a) If you take the dolphin/whale watch cruise on 
two consecutive days, what is the probability that 
you see a whale or a dolphin on both days? 

(b) If you take the dolphin/whale watch cruise on 
two consecutive days, what is the probability you 
will see a dolphin or a whale on at least one day? 
(Hint: First compute the probability that this event 
does not occur.) 

(c) If you want to have a 99% probability of seeing 

a whale or a dolphin at least once, what is the 
minimum number of days that you will need to take 
the cruise? 

Tendon surgery. You have torn a tendon and are 
facing surgery to repair it. The surgeon explains the 
risks to you: infection occurs in 3% of such opera- 
tions, the repair fails in 14%, and both infection and 
failure occur together in 1%. What percent of these 
operations succeed and are free from infection? Fol- 
low the four-step process in your answer. 


A whale of a time, continued. Hacksaw’s Boats 
of St. Lucia takes tourists on a daily dolphin/whale 
watch cruise. Its brochure claims an 85% chance 
of sighting a whale or a dolphin. Suppose there is a 
65% chance of seeing dolphins and a 10% chance 
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13.34 


13.35 


P(A) = 0.4 
P(A and D) = 0.1 


Exercises 
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of seeing both dolphins and whales. Make a Venn 
diagram. Then answer these questions. 


(a) What is the probability of seeing a whale on the 
cruise? 

(b) What is the probability of seeing a whale but 
not a dolphin? 

(c) Are “seeing a whale” and “seeing a dolphin” 
independent events? 

Tendon surgery, continued. You have torn a tendon 
and are facing surgery to repair it. The surgeon explains 
the risks to you: infection occurs in 3% of such opera- 
tions, the repair fails in 14%, and both infection and 
failure occur together in 1%. What is the probability 
of infection, given that the repair is successful? Follow 
the four-step process in your answer. 

Screening job applicants. A company retains a psy- 
chologist to assess whether job applicants are suited 
for assembly-line work. The psychologist classifies 
applicants as one of A (well suited), B (marginal), 
or C (not suited). The company is concerned about 
the event D that an employee leaves the company 
within a year of being hired. Data on all people hired 
in the past five years give these probabilities: 

P(B) = 0.3 P(C) = 03 
P(Band D) = 0.1. P(C and D) = 0.2 
Sketch a Venn diagram of the events A, B, C, and 
D and mark on your diagram the probabilities of all 
combinations of psychological assessment and leav- 
ing (or not) within a year. What is P(D), the prob- 
ability that an employee leaves within a year? 
Cancer-detecting dogs. Research has shown that 
specific biochemical markers are found exclusively 
in the breath of patients with lung cancer. However, 
no lab test can currently distinguish the breath of 
lung cancer patients from that of other subjects. 
Could dogs be trained to identify these markers in 
specimens of human breath, as they can be to detect 
illegal substances or to follow a person’s scent? An 
experiment trained dogs to distinguish breath speci- 
mens of lung cancer patients from breath specimens 
of control individuals by using a food-reward training 
method. After the training was complete, the dogs 
were tested on new breath specimens without any 
reward or clue using a double-blind, completely ran- 
domized design. Here are the results for a random 
sample of 1286 breath specimens:” 


BREATH SPECIMEN FROM A 
DOG TEST CONTROL CANCER 
RESULT SUBJECT SUBJECT TOTAL 
Negative 708 10 718 
Positive 4 564 568 
Total le) 574 1286 
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(a) The sensitivity of a diagnostic test is its ability to 
correctly give a positive result when a person tested 
has the disease, or P (positive test | disease). Find 
the sensitivity of the dog cancer-detection test for 
lung cancer. 


Statistics provides projections for the number of 
degrees earned, classified by level and by the sex of 
the degree recipient. Here are the projected num- 
ber of earned degrees (in thousands) in the United 
States for the 2015-16 academic year:!* 


(b) The specificity of a diagnostic test is the 
conditional probability that the subject tested 


doesn’t have the disease, given that the test has 


come up negative. Find the specificity of the dog Male 


cancer- detection test for lung cancer. 


Income tax returns. Here is the distribution of 
the adjusted gross income (in thousands of dollars) 
reported on individual federal income tax returns in 


2002 


<15 15-29 30-99 
0.263 0.214 0.388 


100-199 =200 
0.102 0.033 


Income 


Probability 


(a) What is the probability that a randomly chosen 
return shows an adjusted gross income of $30,000 
or more? 

(b) Given that a return shows an income of at least 
$30,000, what is the conditional probability that the 
income is at least $100,000? 


Thomas’s pizza. You work at Thomas's pizza shop. 
You have the following information about the seven 
pizzas in the oven: three of the seven have thick 
crust, and of these one has only sausage and two 
have only mushrooms; the remaining four pizzas 
have regular crust, and of these two have only sau- 
sage and two have only mushrooms. Choose a pizza 
at random from the oven. 


(a) Are the events “getting a thick crust pizza” and “get- 
ting a pizza with mushrooms” independent? Explain. 

(b) You add an eighth pizza to the oven. This pizza 
has thick crust with only cheese. Now are the 
events “getting a thick crust pizza” and “getting a 
pizza with mushrooms” independent? Explain. 


A probability teaser. Suppose (as is roughly correct) 
that each child born is equally likely to be a boy or 
a girl and that the sexes of successive children are 
independent. If we let BG mean that the older child 
is a boy and the younger child is a girl, then each of 
the combinations BB, BG, GB, GG has probability 
0.25. Ashley and Brianna each have two children. 


(a) You know that at least one of Ashley’s children 
is a boy. What is the conditional probability that she 
has two boys? 

(b) You know that Brianna’s older child is a boy. 
What is the conditional probability that she has two 
boys? 

College degrees. A striking trend in higher educa- 
tion is that more women than men reach each level 
of attainment. The National Center for Educational 
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ASSOCIATE’S BACHELOR’S MASTER’S DOCTORATE TOTAL 

Female 601 1066 494 95 2256 
Bbil 803 322 87 1569 

Total 958 1869 816 182 3825 


(a) If you choose a degree recipient at random, what 
is the probability that the person you choose is a man? 
(b) What is the conditional probability that you 
choose a man, given that the person chosen 
received a master’s? 

(c) Are the events “choose a man” and “choose a 
master’s degree recipient” independent? How do 
you know? 


College degrees. Exercise 13.39 gives the projected 
counts (in thousands) of earned degrees in the 
United States in the 2015-16 academic year. Use 
these data to answer the following questions. 


(a) What is the probability that a randomly chosen 
degree recipient is a woman? 

(b) What is the conditional probability that the 
person chosen received an associate’s degree, given 
that she is a woman? 

(c) Use the general multiplication rule to find the 
probability of choosing a female associate’s degree 
recipient. Check your result by finding this prob- 
ability directly from the table of counts. 


Deer and pine seedlings. As suburban garden- 
ers know, deer will eat almost anything green. In a 
study of pine seedlings at an environmental center 
in Ohio, researchers noted how deer damage var- 
ied with how much of the seedling was covered by 
thorny undergrowth: 


DEER DAMAGE 
THORNY COVER YES NO 
None 60 151 
21 76 158 
13 to 2/3 44 lee 
Sys} 29 176 


(a) What is the probability that a randomly selected 
seedling was damaged by deer? 

(b) What are the conditional probabilities that a 
randomly selected seedling was damaged, given 
each level of cover? 

(c) Does knowing about the amount of thorny cover 
on a seedling change the probability of deer damage? 
If so, cover and damage are not independent. 


Peter Skinner/Science Source 


13.42 Deer and pine seedlings. In the setting of Exercise 
13.41, what percent of the trees that were not dam- 
aged by deer were more than two-thirds covered by 
thorny plants? 


13.43 Deer and pine seedlings. In the setting of Exercise 


13.41, what percent of the trees that were damaged 
by deer were less than one-third covered by thorny 
plants? 


Julie is graduating from college. She has studied biology, chemistry, 
and computing and hopes to use her science background in crime 
investigation. Late one night she thinks about some jobs for which 
she has applied. Let A, B , and C be the events that Julie is offered 
a job by 

A = the Connecticut Office of the Chief Medical Examiner 
B = the New Jersey Division of Criminal Justice 

C = the Federal Disaster Mortuary Operations Response Team 
Julie writes down her personal probabilities for being offered these 
jobs: 

P(A) = 0.5 P(B) = 0.4 P(C) = 02 

P(A and B) = 0.1 P(AandC) = 0.05 P(B and C) = 0.05 
P(A and B and C) = 0 

Make a Venn diagram of the events A, B, and C. As in Figure 13.4 


(see page 308), mark the probabilities of every intersection involving 
these events. Use this diagram for Exercises 13.44 to 13.46. 


13.44 Will Julie get a job offer? What is the probability 
that Julie is not offered any of the three jobs? 

Will Julie get just these offers? What is the prob- 
ability that Julie is offered the Connecticut job, but 
not the New Jersey or federal job? 
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13.46 Julie’s conditional probabilities. If Julie is offered 
the federal job, what is the conditional probability 
that she is also offered the New Jersey job? If Julie 
is offered the New Jersey job, what is the condi- 
tional probability that she is also offered the fed- 


eral job? 


13.47 The geometric distributions. You are rolling a pair of 


balanced dice in a board game. Rolls are independent. 


13.48 


13.49 


Exercises 


323 


You land in a danger zone that requires you to roll 
doubles (both faces show the same number of spots) 
before you are allowed to play again. How long will 
you wait to play again? 

(a) What is the probability of rolling doubles on 

a single toss of the dice? (If you need review, the 
possible outcomes appear in Figure 12.2 (page 282). 
All 36 outcomes are equally likely.) 

(b) What is the probability that you do not roll 
doubles on the first toss, but you do on the second 
toss? 

(c) What is the probability that the first two tosses 
are not doubles and the third toss is doubles? This is 
the probability that the first doubles occurs on the 
third toss. 

(d) Now you see the pattern. What is the probability 
that the first doubles occurs on the fourth toss? On the 
fifth toss? Give the general result: what is the probabil- 
ity that the first doubles occurs on the kth toss? 

(e) What is the probability that you get to go again 
within three turns? 


(Comment: The distribution of the number of trials 
to the first success is called a geometric distribution. In 
this problem, you have found geometric distribution 
probabilities when the probability of a success on 
each trial is 1/6. The same idea works for any prob- 
ability of success.) 


Winning at tennis. A player serving in tennis has 
two chances to get a serve into play. If the first 
serve is out, the player serves again. If the second 
serve is also out, the player loses the point. Here are 
probabilities based on four years of the Wimbledon 
Championship: 


P(1st serve in) = 0.59 

P(win point | lst serve in) = 0.73 

P(2nd serve in | Ist serve out) = 0.86 

P(win point | lst serve out and 2nd serve in) = 0.59 


Make a tree diagram for the results of the two serves 
and the outcome (win or lose) of the point. (The 
branches in your tree have different numbers of 
stages depending on the outcome of the first serve.) 
What is the probability that the serving player wins 
the point? 

Peanut allergies among children. About 2% of chil- 
dren in the United States are allergic to peanuts.” 
Choose three children at random and let the ran- 
dom variable X be the number in this sample who are 
allergic to peanuts. The possible values X can take 
are 0, 1, 2, and 3. Make a three-stage tree diagram of 
the outcomes (allergic or not allergic) for the three 
individuals and use it to find the probability distribu- 
tion of X. 
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Winning at tennis, continued. Based on your work make a pledge either do or don’t actually make a 
in Exercise 13.48, in what percent of points won by contribution. 
the server was the first serve in? (Write this ao cOn: (a) What percent of calls result in a contribution? 
ditional probability and use the definition of condi- (b) What percent of those who contribute are 
tional probability.) Fecont cdGnots! 
Peanut allergies among children, continued. | DNA Forensics. When a suspects DNA is compared to a sample 
Continue your work from Exercise 13.49. What — of DNA collected at a crime scene, the comparison is made between 
is the conditional probability that exactly two of certain sections of the DNA called loci. Each locus has two alleles 
the children will be allergic to peanuts, given that (gene forms), one inherited from our mother and the other from our 
at least one of the three children suffers from this _ father. Suppose there are two alleles for a particular locus called A and 
allergy? B. These alleles can be present at the locus in three combinations. A 
Lactose intolerance. Lactose intolerance causes dif. Person could have both alleles at the locus.be A, one allele be A and 
ficulty digesting dairy products that contain lactose the other B, or both alleles be B, giving the three combinations (A and 
(milk sugar). It is particularly common among people A), (A and B), and (B and B). Here’s how the math works. If the 
of African and Asian ancestry. In the United States Proportion of the population with allele A as one of their alleles at the 
(ignoring other groups and people who consider locus is a, and the proportion of the population with allele B as one of 
themselves to belong to more than one race), 82% their alleles at the locus is b, then the proportion of the population with 
of the population is white, 14% is black, and 4% is the three combinations of these allele types at the locus follows: 
Aslan: Moreover, 15% of whites, ae of blacks, and POPULATION PROPORTION 
90% of Asians are lactose intolerant. ALLELES AT THE LOCUS WITH ALLELE COMBINATION 
(a) What percent of the entire population is lactose To = 
intolerant? eas ih 
(b) What percent of people who are lactose intoler- = ie 
ant are Asian? BandB hb? 
Fundraising by telephone. Tree di 
Cone tne! Set wears | UGat is information in Exercises 13.54 and 13.55. The numbers 
organize problems having more than two stages. Rene : Fe onpidaenae 2 
Figure 13.7 shows probabilities for a charity call- Pei we phen) ate ec 
ing potential donors by telephone.’? Each person 13.54 Suppose the locus D21S11 has two alleles called 
called is either a recent donor, a past donor, or a 29 and 31. The proportion of the Caucasian popu- 
new prospect. At the next stage, the person called lation with allele 29 is 0.181 and with allele 31 is 
either does or does not pledge to contribute, with 0.071. What proportion of the Caucasian population 
conditional probabilities that depend on the donor has the combination (29, 31) at the locus D21S11? 
class the person belongs to. Finally, those who What proportion has the combination (29, 29)? 
we Check Contribute 
fee ee Nocheck Not 
= 0 pledge 
= Check Contribute 
03 Pledge 
03 oe a 
eo. 04 Nocheck Not 
— No pledge 


edge 


0.2 LS Check 
p ee 


New A 
prospect 


0.9 No pledge 


FIGURE 13.7 

Tree diagram for fundraising by telephone, 
for Exercise 13.53. The three stages are 
the type of prospect called, whether or not 
the person makes a pledge, and whether 
or not a person who pledges actually 
makes a contribution. 


Contribute 


Nocheck Not 


13.55 Suppose the locus D3S1358 has two alleles called 16 
and 17. The proportion of the Caucasian population 
with allele 16 is 0.232 and with allele 17 is 0.213. 
What proportion of the Caucasian population has 
the combination (16, 17) at the locus D3S1358? 


One important fact regarding the loci evaluated in such forensic 
tests is that the allele combinations at each locus have been shown 
to be independent. Use this information in Exercise 13.56. 


13.56 What proportion of the Caucasian population has 
the combination (29, 31) at the locus D21S11 and 
combination (16, 17) at the locus D3S1358? As we 
specify the alleles present at more loci, what will hap- 
pen to the proportion of the Caucasian population 
that matches the allele combinations at all the loci? 


A defendant in Ohio was indicted on December 17, 2009, on 
the charges of Aggravated Burglary and Assault (State of Ohio vs. 
Myers, Case No. 09 CR 666). In this case, a hair found at the 
crime scene was tested at six loci and demonstrated a specific 
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combination of alleles found in a proportion of about one in 1.6 million 
individuals in the population. Comparison of the DNA profile 
found on the hair to a database of convicted felons revealed a match 
between the allelic profile found on the hair and an individual in the 
database (the defendant). Defense attorneys in the case requested 
that the state perform additional DNA testing because several pre- 
viously untested loci were available to test. The results of this testing 
revealed that the defendant did not match at some of these newly 
tested markers, indicating that the DNA from the hair was not the 
defendant’s and the charges were dropped. Use this information in 
Exercise 13.57. 


13.57 Ifthe DNA profile (or combination of alleles) found 
on the hair is possessed by one in 1.6 million indi- 
viduals, and the database of convicted felons con- 
tains 4.5 million individuals, approximately how 
many individuals in the database would demonstrate 
a match between their DNA and that found on the 


hair? 


13.58 SAT Mathematics scores. The website http: //research.college 
board.org/programs/sat/data presents data for high school seniors 
who participated in the SAT program from both the current year and previous 
years. Under Data, click on the link for College Bound Seniors for the most recent 
year given. In the window that opens, click on the link for Total Group Report for 
this year. The Total Group Profile Report presents data for high school graduates 
who participated in the SAT program that year. Students are counted only once, 
no matter how often they are tested, and only their latest scores are summarized. 
Suppose a high school graduate is selected at random. Use the information in the 


tables available to answer the following questions. 


(a) What is the probability that the selected student is female? 


(b) What is the probability that the selected student scores 600 or over on the 


Mathematics section of the SAT? 


(c) What is the conditional probability that the selected student scores 600 or 
over on the Mathematics section given that the student is male? 


(d) What is the conditional probability that the selected student scores 600 or 
over on the Mathematics section given that the student is female? 


(e) Are scoring over 600 on the Mathematics section and sex independent? If 
not, explain in a simple sentence the nature of the dependence. 


13.59 


Let’s make a deal. The Monty Hall Problem is an example of a simple prob- 


ability problem with an answer that is counterintuitive. The problem was made 
popular when Marilyn vos Savant published it in her Parade Magazine column.”! 


Here is the question: 


Suppose you’re on a game show, and you're given a choice of three doors: Behind 
one door is a car; behind the others, goats. You pick a door, say number 1, and 
the host, who knows what’s behind the doors, opens another door, say number 3, 
which has a goat. He says to you, “Do you want to pick door number 2?” Is it to 


your advantage to switch your choice of doors? 
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(a) Go to the website www.nytimes.com/2008/04/08/science/ 
O8monty.html and click on Let’s Play. Choose a door and click on it, and 
then click on Continue. Now click on Switch. Did you win? Click on Try Again 
and play the game 30 times, switching doors each time. What percent of the time 
did you win by switching? What percent of the time would you have won if you 
didn’t switch? Based on your simulation, does it seem better to switch or stay with 
your initial choice? 


(b) The website provides an explanation of why it is better to switch doors. 
Click on See How It Works. The probability of winning if you switch is 2/3, and if 
you don't switch it is 1/3. How well do these probabilities agree with the propor- 
tions you found in your simulation? 
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In this chapter 


Binomial Distributions: we cover. 


14.1. The binomial setting and 
binomial distributions 


14.2 Binomial distributions in 
statistical sampling 


14.3 Binomial probabilities 


basketball player shoots five free throws; how many does she make? A sample Toso MSU Useuteleay) 


survey dials 1200 landline phone numbers at random; how many of the num- 14.5 Binomial mean and 
bers dialed correspond to working residential numbers? You plant 10 dogwood standard deviation 
trees; how many live through the winter? In all these situations, we want a probabil- ieee nennalarprorinaion 


ity model for a count of successful outcomes out of a fixed (known) number of trials. 


to binomial distributions 


The distribution of a count depends on how the data are produced. Here is a 
common situation. 


The Binomial Setting 


1. There are a fixed number n of observations. 


2. The n observations are all independent. That is, knowing the result of one 
observation does not change the probabilities we assign to other observations. 


3. Each observation falls into one of just two categories, which for convenience we 
call “success” and “failure.” 


4. The probability of a success, call it p, is the same for each observation. 


*This chapter concerns a special topic in probability. The material is not needed to understand the 


rest of the book. 327 
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usti¢s WAS HE GOOD OR Think of tossing a coin n times as an example of the binomial setting. Each toss 
Wd WAS HE LUCKY? gives either heads or tails. Knowing the outcome of one toss doesn’t change the prob- 
When a baseball player hits .300, ability of a head on any other toss, so the tosses are independent. If we call heads a 
everyone applauds. A .300 hitter success, then p is the probability of a head and remains the same as long as we toss 


Saas ee 
gets a hit in 30% of times at bat. the same coin. For tossing a coin, p is close to 0.5. If we spin the coin on a flat surface 
Could a .300 year just be luck? 


rather than toss it, p is not equal to 0.5. The number of heads we count is a discrete 
Typical major leaguers bat about 500 ; ae : i et 
times a season and hit about .260.A  tandom variable X. The distribution of X is called a binomial distribution. 
hitter’s successive tries seem to be 


independent, so we have a binomial 
setting. From this model, we can 


Binomial Distribution 


calculate or simulate the probability The count X of successes in the binomial setting has the binomial distribution with 
of hitting 300. It is about 0.025. Out parameters n and p. The parameter n is the number of observations, and p is the prob- 
of 100 run-of-the-mill major league ability of a success on any one observation. The possible values of X are the whole 


hitters, two or three each year will 
bat .300 because they were lucky. 


numbers from 0 to n. 


The binomial distributions are an important class of discrete probability 
models. Pay attention to the binomial setting, because not all counts have bino- 
mial distributions. 


Genetics says that children receive genes from their parents independently. Each child 
of a particular pair of parents has probability 0.25 of having type O blood. If these 
parents have five children, the number who have type O blood is the count X of suc- 
cesses in five independent observations with probability 0.25 of a success on each 
observation. So X has the binomial distribution with n = 5 and p = 0.25. m 


Here is a set of genetic examples that require more thought. 

Choose two births at random from the last year’s births at a large hospital and 
count the number of boys (O, 1, or 2). The sexes of children born to different moth- 
ers are surely independent. The probability that a randomly chosen birth in Canada 
and the United States is a boy is about 0.52. (Why it is not 0.5 is something of a 
mystery.) So the count of boys has a binomial distribution with n = 2 and p = 0.52. 

Next, observe successive births at a large hospital, and let X be the number of 
births until the first boy is born. Births are independent, and each has probability 
0.52 of being a boy. Yet X is not binomial because there is no fixed number of obser- 
vations. “Count observations until the first success” is a different setting than “count 
the number of successes in a fixed number of observations.” 

Finally, choose at random a family with exactly two children and count the number 
of boys. Careful study of such families shows that the count of boys is not binomial: 
the probability of exactly one boy is too high.' Families are less likely to have a third 
child if the first two are a boy and a girl, so when we look at families that stopped 
at two children, “one of each” is more common than if we look at randomly chosen 
births. The sexes of successive children in two-child families are not independent 
because the parents’ choices interfere with the genetics. m 


The binomial distributions are important in statistics when we wish to make infer- 
ences about the proportion p of “successes” in a population. Here is a typical example. 
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Physical damage to tomatoes, which can occur throughout the distribu- 
tion system from field to consumer, has a major impact on market loss 
of fresh tomatoes. At the packing house, a distributor inspects an SRS 
of 10 tomatoes from a shipment of 10,000 tomatoes. Suppose that 
(unknown to the distributor) 11% of tomatoes in the shipment can be 
considered unmarketable due to physical damage, most generally bruising.* 
Count the number X of unmarketable tomatoes in the sample. 

This is not quite a binomial setting. Removing one tomato changes the 
proportion of unmarketable tomatoes remaining in the shipment. So 
the probability that the second tomato chosen is unmarketable changes 
when we know whether the first is unmarketable or not. But removing 
one tomato from a shipment of 10,000 changes the makeup of the 
remaining 9999 tomatoes very little. In practice, the distribution of X is 
very close to the binomial distribution with n = 10 and p = 0.11. m8 


lola 1960/Shutterstock 


Example 14.3 shows how we can use the binomial distributions in the statistical 
setting of selecting an SRS. When the population is much larger than the sample, a 
count of successes in an SRS of size n has approximately the binomial distribution with 
n equal to the sample size and p equal to the proportion of successes in the population. 


Sampling Distribution of a Count 


Choose an SRS of size n from a population with proportion p of successes. When the 
population is much larger than the sample, the count X of successes in the sample has 
approximately the binomial distribution with parameters n and p. 


() LaunchPad Online Resources 


* The StatBoards video, The Binomial Scenario, reviews the assumptions 
required to use the binomial distribution and how to recognize when these 
assumptions are met. 


Apply Your Knowledge 


In each of Exercises 14.1 to 14.3, X is a count. Does X have a binomial distribution? Give 
your reasons in each case. 


14.1 Working Numbers. When an opinion poll calls landline telephone num- 
bers at random, approximately 30% of the numbers are working residential 
phone numbers. The remainder are either non-residential, non-working, 
or computer/fax numbers.? You watch the random dialing machine make 
20 calls. X is the number that reach a working residential number. 


14.2 Working Numbers. When an opinion poll calls landline telephone numbers 
at random, approximately 30% of the numbers are working residential phone 
numbers. The remainder are either non-residential, non-working, or computer/ 
fax numbers. You watch the random dialing machine make calls. X is the 
number of calls until the first working residential number is reached. 

14.3 Boxes of Tiles. Boxes of six-inch slate flooring tile contain 40 tiles per box. 
The count X is the number of cracked tiles in a box. You have noticed that 
most boxes contain no cracked tiles, but if there are cracked tiles in a box, 
then there are usually several. 

14.4 Smoking in Canada. Statistics Canada reports that 19.9% of Canadians 
aged 12 and older smoke either daily or occasionally.* If you take an SRS of 
1500 Canadians aged 12 and over, what is the approximate distribution of the 
number in your sample who smoke either daily or occasionally? 
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wat Looks 
gin RANDOM? 


Toss a coin six times and record 
heads (H) or tails (T) on each toss. 
Which of these outcomes is more 
probable: HTHTTH or TTTHHH? 
Almost everyone says that HTHTTH 
is more probable because TTTHHH 
does not “look random.” In fact, both 
are equally probable. That heads has 
probability 0.5 says that about half of 
a very long sequence of tosses will be 
heads. It doesn’t say that heads and 
tails must come close to alternating 
in the short run. The coin doesn’t 
know what past outcomes were, 

and it can’t try to create a balanced 
sequence. 
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We can find a formula for the probability that a binomial random variable takes 
any value by adding probabilities for the different ways of getting exactly that many 
successes in n observations. Here is an example that illustrates the idea. 


The blood types of successive children born to the same parents are independent 
and have fixed probabilities that depend on the genetic makeup of the parents. Each 
child born to a particular set of parents has probability 0.25 of having blood type O. 
If these parents have five children, what is the probability that exactly two of them 
have type O blood? 

The count of children with type O blood is a binomial random variable X with n = 5 
tries and probability p = 0.25 of a success on each try. We want P(X = 2). — 


Because the method doesn’t depend on the specific example, let’s use “S” for 
success and “F” for failure for short, with a success representing type O blood. Do 
the work in two steps. 


Step 1. Find the probability that a specific two of the five tries—say, the first and 
the third—give successes. This is the outcome SFSFE Because tries are independent, 
the multiplication rule for independent events applies. The probability we want is 


P(SESFE) = P(S)P(F)P(S)P(E)P(E) 
= (0.25)(0.75)(0.25)(0.75)(0.75) 
= (0.25)(0.75)? 
Step 2. Observe that any one arrangement of two Ss and three Fs has this 
same probability. This is true because we multiply together 0.25 twice and 0.75 
three times whenever we have two Ss and three Fs. The probability that X = 2 is 


the probability of getting two Ss and three Fs in any arrangement whatsoever. Here 
are all the possible arrangements: 


SSFFF  SFSFF 
FSESF FSFFS 


There are 10 of them, all with the same probability. The overall probability of two 
successes is therefore 


SFFSF  SFFFS 
FFSSF  FESFS 


FSSFF 
FFESS 


P(X = 2) = 10(0.25)(0.75)> = 0.2637 


The pattern of this calculation works for any binomial probability. To use it, we 
must count the number of arrangements of k successes in n observations. We use 
the following fact to do the counting without actually listing all the arrangements. 


Binomial Coefficient 


The number of ways of arranging k successes among n observations is given by the 
binomial coefficient 


fOr ke Ow lt 
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The formula for binomial coefficients uses the factorial notation. For any posi- _ factorial 
tive whole number n, its factorial n! is 


w=nX = 1l Xia — 2) xX XO KZ XI 


In addition, we define 0! = 1. 

The larger of the two factorials in the denominator of a binomial coefficient will 
cancel much of the n! in the numerator. For example, the binomial coefficient we 
need for Example 14.4 is 


A The binomial coefficient (;) is not related to the fraction 3. A helpful way to 


remember its meaning is to read it as “5 choose 2.” Binomial coefficients 
have many uses, but we are interested in them only as an aid to finding binomial 
n 
k 
which k successes can be arranged among n observations. The binomial probability 
P(X =k) is this count multiplied by the probability of any one specific arrangement 
of the k successes. Here is the result we seek. 


probabilities. The binomial coefficient ( ) counts the number of different ways in 


Binomial Probability 


If X has the binomial distribution with n observations and probability p of success on 
each observation, the possible values of X are 0, 1, 2,...,n. Ifk is any one of these 
values, 


P(X = k) = ("ec — pyr 


The number X of unmarketable tomatoes in Example 14.3 has approximately the 
binomial distribution with n = 10 and p = 0.11. 
The probability that the sample contains no more than one unmarketable tomato is 


P(X <1) = P(X = 1) + P(X =) 
= 7 (0.11)'(0.89)° + (P)o1 1)°(0.89)'° 


= 10 6.419(0.3804) + (410.3118) 
119) ‘ OHO! 


= (10)(0.11)(0.3504) + (1)(1)(0.3118) 
= 0.3854 + 0.3118 = 0.6972 


This calculation uses the facts that O! = 1 and that a® = 1 for any number a other 
than O. We see that about 70% of all samples will contain no more than one unmar- 
ketable tomato. In fact, about 31% of the samples will contain no unmarketable toma- 
toes. A sample of size 10 cannot be trusted to alert the distributor to the presence 
of unacceptable tomatoes in the shipment. 
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The complement rule described in Chapter 12 can make the computation of 
certain binomial probabilities simpler. For example, the probability that the sample 
contains at least one unmarketable tomato is 


P(X=1)=P(X=1)+ P(X = 2) + + A(X = 10) 
=1-P(X=0) 
= 1 — 0.3118 = 0.6882 


When computing binomial probabilities by hand, it is useful to keep the complement 
rule in mind. m 


The binomial probability formula is awkward to use unless the number of obser- 
vations n is quite small. You can find tables of binomial probabilities P(X = k) and 
cumulative probabilities P(X = k) for selected values of n and p, but the most effi- 
cient way to do binomial calculations is to use technology. 

Figure 14.1 shows output for the calculation in Example 14.5 from a graphing calculator, 
two statistical programs, and a spreadsheet program. We asked all four to give cumula- 
tive probabilities. The calculator, Minitab, and Crunchlt! have menu entries for binomial 
cumulative probabilities. Excel has no menu entry, but the worksheet function BINOM. 
DIST is available. All the outputs agree with the result 0.6972 of Example 14.5. JMP can 
also be used to obtain binomial probabilities, but you need to access the formula editor. 


Texas Instruments Graphing Calculator 


binomcdf (10,0.11,1) 


.6972 


Minitab 

f@ Session 
Cumulative Distribution Function 
Binomial with n= 10 and p=0.11 


P(X <=x) 
0.311817 
0.697209 
0.911557 
0.982203 


Crunchlt! 
Binomial Distribution Calculator «|x 
Parameters 
n: 10 
p: 11 


Quantile 


FIGURE 14.1 ———_-----— 
) =0.6972092433778954 


The binomial probability P(X = 1) for 
Example 14.5: output from a graphing 
calculator, two statistical programs, and | 
a spreadsheet program. 


Help Cancel 


Microsoft Excel 


Al v (* ifs 


=BINOMDIST(1,10,0. 


#) Book1 = 


1 0.697209 


\ 4 > >| Sheet1 .“Sheet2 | 


A | B | ¢ | bd | CE 


_Sheet3_ 
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* The StatBoards video, Calculating Binomial Probabilities, illustrates the use 


of the formula to compute binomial probabilities. 


Apply Your Knowledge 


14.5 


14.6 


14.7 


Proofreading. Typing errors in a text are either nonword errors (as when 
“the” is typed as “teh”) or word errors that result in a real but incorrect word. 
Spell-checking software will catch nonword errors but not word errors. 
Human proofreaders catch 70% of word errors. You ask a fellow student to 
proofread an essay in which you have deliberately made 10 word errors. 

(a) Ifthe student matches the usual 70% rate, what is the distribution of 
the number of errors caught? What is the distribution of the number of 
errors missed? 

(b) Missing three or more out of 10 errors seems a poor performance. 
What is the probability that a proofreader who catches 70% of word 
errors misses exactly three out of 10? If you use software, also find the 
probability of missing three or more out of 10. 


Working Numbers. When an opinion poll calls landline telephone num- 

bers at random, approximately 30% of the numbers are working residential 

phone numbers. The remainder are either non-residential, non-working, or 

computer/fax numbers. You watch the random dialing machine make 20 calls. 

(a) What is the probability that exactly three calls reach working residential 
numbers? 

(b) What is the probability that at most three calls reach working residential 
numbers? 

(c) What is the probability that at least three calls reach working residential 
numbers? 

(d) What is the probability that fewer than three calls reach working 
residential numbers? 

(e) What is the probability that more than three calls reach working 
residential numbers? 


Google Does Binomial. Point your web browser to www. google.com. 
Instead of searching the web or looking for images, you can request a calcula- 
tion in the Search box. 

(a) Enter 5 choose 2 and click Search. What does Google return? 

(b) You see that Google calculates the binomial coefficient “5 choose 2.” 
What are the values of the binomial coefficients for “500 choose 2” 
and “500 choose 100”? We expect that there are more ways to choose 
100 than to choose 2, but how many more may be a surprise. That 
e+107 in Google’s answer means a | followed by 107 zeros. 

(c) Google also does binomial probabilities. Enter (10 choose 1)*0.11* 
0.89%9 to find P(X = 1) in Example 14.5. What is Google’s answer 
with all its decimal places? 
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FIGURE 14.1 
(Continued) 
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If a count X has the binomial distribution based on n observations with probability p 
of success, what is its mean ju? That is, in very many repetitions of the binomial setting, 
what will be the average count of successes? We can guess the answer. If a basketball player 
makes 80% of her free throws, the mean number made in 10 tries should be 80% of 10, or 
8. In general, the mean of a binomial distribution should be 7 = np. Here are the facts. 


Binomial Mean and Standard Deviation 


If a count X has the binomial distribution with number of observations n and prob- 
ability of success p, the mean and standard deviation of X are 


f= np 


a = Vnp(l — p) 


Remember that these short formulas are good only for binomial distributions. 
They can’t be used for other distributions. 


Continuing Example 14.5, the count X of unmarketable tomatoes is binomial with 


TQ RANDOMNESS 


pt n= 10 and p = 0.11. The histogram in Figure 14.2 displays this probability distribution. 

ull TURNS SILVER (Because probabilities are long-run proportions, using probabilities as the heights of the 
TO BRONZE bars shows what the distribution of X would be in very many repetitions.) The distribution 

After many charges of favoritism is strongly right skewed. Although X can take any whole-number value from zero to 10, the 

by judges, the rules for scoring probabilities of values larger than five are so small that they do not appear in the histogram. 

international figure skating The mean and standard deviation of the binomial distribution in Figure 14.2 are 

competitions changed in 2004. 

The big change is that 12 judges b= np 

score - sah ern ae = (10)(0.11) = 1.1 

scores trom three judges chosen at / 

random are ee ~ each part a = Vnp(i — p) 

of the program. = V(10)(0.11)(0.89) = V0.979 = 0.9894 

So there ae(1)) = 220 possible The mean is marked on the probability histogram in Figure 14.2. m= 


panels of nine judges for (say) the 
“Free Skate,” and these panels 
will have slightly different scores. 
Result: at the 2006 World Figure 
Skating Championships, the 


=e 


0.4 


Russian pair Maria Petrova and 03 
Alexei Tikhonov received the 
bronze medal when the consensus £ 
of all 12 judges would have given s 
them the silver medal. Perhaps Ss 0.2 
the system needs another change. £ 
01 
FIGURE 14.2 0.0 
Probability histogram for the binomial ) il » 3 4 5 6 


distribution with n = 10 and 
p = 0.11, for Example 14.6. 


Count of unmarketable tomatoes 
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(4) LaunchPad Online Resources 


* The StatBoards video, The Mean and Standard Deviation of Binomials, 
provides an additional example of the computation of the mean and standard 
deviation for a binomial distribution. 


Apply Your Knowledge 


14.8 Working Numbers. When an opinion poll calls landline telephone num- 
bers at random, approximately 30% of the numbers are working residential 
phone numbers. The remainder are either non-residential, non-working, or 
computer/fax numbers. You watch the random dialing machine make 20 calls. 


(a) What is the mean number of calls that reach working residential numbers? 


(b) What is the standard deviation o of the count of calls that reach working 
residential numbers? 


(c) Suppose that the probability of reaching a working residential number 
was only p = 0.20. How does this new p affect the standard deviation? 
What would be the standard deviation if p = 0.10? What does your 
work show about the behavior of the standard deviation of a binomial 
distribution as the probability of a success gets closer to zero? 

14.9 Proofreading. Return to the proofreading setting of Exercise 14.5. 

(a) IfX is the number of word errors missed, what is the distribution of X? 
If Y is the number of word errors caught, what is the distribution of Y? 

(b) What is the mean number of errors caught? What is the mean number 
of errors missed? The mean counts of successes and of failures always 
add to n, the number of observations. 

(c) What is the standard deviation of the number of errors caught? What is the 
standard deviation of the number of errors missed? The standard deviations 
of the count of successes and the count of failures are always the same. 


It isn’t practical to use the formula for binomial probabilities when the number 
of observations n is large. (Look at part (b) of Exercise 14.7 to see why.) Software 
or a graphing calculator will handle many problems that are beyond the scope of 
hand calculation. If technology does not rescue you, there is another alternative: as 
the number of observations n gets larger, the binomial distribution gets close to a Normal 
distribution. When n is large, we can use Normal probability calculations to approxi- 
mate binomial probabilities. Here are the facts. 


Normal Approximation for Binomial Distributions 


Suppose that a count X has the binomial distribution with n observations and suc- 
cess probability p. When n is large, the distribution of X is approximately Normal, 


N(np, Vnp(1 — p))- 
As a rule of thumb, we will use the Normal approximation when n is so large that 
np = 10 and n(1 — p) = 10. 


The Normal approximation is easy to remember because it says to act as if X is 
Normal with exactly the same mean and standard deviation as the binomial. The 
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accuracy of the Normal approximation improves as the sample size n increases. It 
is most accurate for any fixed n when p is close to 1/2, and least accurate when p 
is near O or 1. This is why the rule of thumb in the box depends on p as well as n. 


Sample surveys show that more people are tracking changes in their health on paper, 
spreadsheet, mobile device, or just “in their heads.” A survey asked a nationwide random 
sample of 3014 adults, “Now thinking about your health overall, do you currently keep 
track of your own weight, diet, or exercise routine, or is this not something you currently 
do?”° The population that the poll wants to draw conclusions about is all U.S. residents 
aged 18 and over. Suppose that in fact 62% of all adult U.S. residents would say “Yes” if 
asked this question. What is the probability that 1900 or more of the sample say “Yes”? m 


Because there are about 235 million adults in the United States, the responses of 
3014 randomly chosen adults are very close to independent. So the number in our 
sample who are tracking their weight, diet, or exercise routine is a random vari- 
able X having the binomial distribution with n = 3014 and p =0.62. To find the 
probability P(X = 1900) that at least 1900 of the people in the sample are tracking 
their weight, diet, or exercise routine, we must add the binomial probabilities of all 
outcomes from X = 1900 to X = 3014. Figure 14.3 is a probability histogram of 
this binomial distribution, from Minitab. As the Normal approximation suggests, 
the shape of the distribution looks Normal. The probability we want is the sum of 
the heights of the shaded bars. Here are three ways to find this probability. 


Distribution Plot 
Binomial, n = 3014, p = 0.62 


0.016 4 
0.014 4 
0.012 4 
& 0.010 4 
3 0.008 4 
» 
FIGURE are a aoe 
Probability histogram for the binomial o 
distribution with n = 3014, p = 0.62. 0.004 5 
The bars at and above 1900 are shaded 0.002 4 
to highlight the probability of getting at 0.000 
least 1900 successes. The shape of this , 1900 
binomial probability distribution closely xX 


resembles a Normal curve. 


1. Use technology. Statistical software can find the exact binomial probability. In 
most cases, software finds cumulative probabilities P(X = x). So start by writing 


P(X = 1900) = 1 — P(X S 1899) 
Here is Minitab’s answer for P(X =< 1899): 
Binomial with n = 3014 and p = 0.62 


x P(X<=x) 
1899 0.876384 


The probability we want is 1 — 0.876384 = 0.123616, correct to six decimal places. 


2. Simulate a large number of samples. Figure 14.4 displays a histogram of the 
counts X from 5000 samples of size 3014 when the truth about the population is 
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8004 


600 4 


400-4 


200 4 


Number of occurrences in 5000 samples 


1780 1800 1820 1840 1860 1880 1900 1920 1940 1960 
Count X of successes 


p = 0.62. The simulated distribution, like the exact distribution in Figure 14.3, 
looks Normal. Because 649 of these 5000 samples have X at least 1900, the 
probability estimated from the simulation is 


649 
P(X = 1900) = —— = 0.1298 
( ) 5000 
This estimate misses the true probability by about 0.006. The law of large 
numbers says that the results of such simulations always get closer to the true 
probability as we simulate more and more samples. 


3. Both of the previous methods require software. We can avoid the need for 
software by using the Normal approximation. 


Act as though the count X in Example 14.7 has the Normal distribution with the same 
mean and standard deviation as the binomial distribution: 


we = np = (3014)(0.62) = 1868.68 
o = Vnp(1 — p) = V(3014)(0.62)(0.38) = 26.648 


Standardizing X gives a standard Normal variable Z. The probability we want is 


X — 1868.68 _ 1900 — 1868.68 
P(X=1 =P = 
eee ( 26.648 26.648 ) 
P(Z = 1.18) 
= 1 — 0.8810 = 0.1190 


The Normal approximation 0.1190 misses the true probability calculated in Example 14.7 
by about 0.005. m 


FIGURE 14.4 
Histogram of 5000 simulated binomial 
counts (n = 3014, p = 0.62). 


The Normal Approximation to Binomial applet shows in visual form how well the 
Normal approximation fits the binomial distribution for any n and p. You can slide n 
and watch the approximation get better. Whether or not the Normal approximation 
is satisfactory depends on how accurate your calculations need to be. For most sta- 
tistical purposes, great accuracy is not required. Our rule of thumb for use of the 
Normal approximation reflects this judgment. 
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(4) LaunchPad Online Resources 


* The StatBoards video, The Normal Approximation to the Binomial, reviews the 
details of using the Normal distribution to approximate a binomial probability. 


Apply Your Knowledge 


14.10 Using Benford’s Law. According to Benford’s law (Example 12.7, page 287) 
the probability that the first digit of the amount of a randomly chosen 
invoice is a 1 or a 2 is 0.477. You examine 90 invoices from a vendor and 
find that 29 have first digits 1 or 2. If Benford’s law holds, the count of 1s 
and 2s will have the binomial distribution with n = 90 and p = 0.477. Too 
few ls and 2s suggests fraud. What is the approximate probability of 29 or 
fewer Is and 2s if the invoices follow Benford’s law? Do you suspect that the 
invoice amounts are not genuine? 


14.11 College Admissions. A small liberal arts college in Ohio would like to 
have an entering class of 450 students next year. Past experience shows that 
about 37% of the students admitted will decide to attend. The college is 
planning to admit 1175 students. Suppose that students make their decisions 
independently and that the probability is 0.37 that a randomly chosen stu- 
dent will accept the offer of admission. 

(a) What are the mean and standard deviation of the number of students 
who accept the admissions offer from this college? 

(b) Use the Normal approximation to approximate the probability that the 
college gets more students than it wants. Be sure to check that you can 
safely use the approximation. 

(c) Use software or an online binomial calculator to compute the exact 
probability that the college gets more students than it wants. How 
good is the approximation in part (b)? 

(d) ‘To decrease the probability of getting more students than are wanted, 
does the college need to increase or decrease the number of students 
it admits? Using software or an online binomial calculator, what is the 
largest number of students that the college can admit if administrators 
want the exact probability of getting more students than they want to 
be no larger than 5%? 


14.12 Checking for Survey Errors. One way of checking the effect of under- 
coverage, nonresponse, and other sources of error in a sample survey is to 
compare the sample with known facts about the population. About 24% of 
the Canadian population over 15 are first generation, that is, they were born 
outside Canada. The number X of first-generation Canadians in random 
samples of 1000 persons over 15 should therefore vary with the binomial 
(n = 1000, p = 0.24) distribution. 

(a) What are the mean and standard deviation of X? 

(b) Use the Normal approximation to find the probability that the sample 
will contain between 210 and 270 first-generation Canadians. Be sure 
to check that you can safely use the approximation. 


CHAPTER 14 SUMMARY 


Chapter Specifics 


m Acount X of successes has a binomial distribution in the binomial setting: there are n 
observations; the observations are independent of each other; each observation results 
in a success or a failure; and each observation has the same probability p of a success. 


mg The binomial distribution with n observations and probability p of success gives a good 
approximation to the sampling distribution of the count of successes in an SRS of size 
n from a large population containing proportion p of successes. 


m If X has the binomial distribution with parameters n and p, the possible values of X are 
the whole numbers 0, 1, 2,...,n. The binomial probability that X takes any of these 
values is 


n 


P(X =k) = (oc — pyr-k 


Binomial probabilities in practice are best found using software. 


(t) = way 


counts the number of ways k successes can be arranged among n observations. Here the 
factorial n! is 


a The binomial coefficient 


nl} =n X(n—1)X(n-—2)X-X3X2X1 
for positive whole numbers n, and 0! = 1. 


m The mean and standard deviation of a binomial count X are 
= np 
ao = Vnp(1 — p) 


m The Normal approximation to the binomial distribution says that if X is a count 
having the binomial distribution with parameters n and p, then when n is large, X 
is approximately N(np, Vnp(1 — p)). Use this approximation only when np = 10 and 
n(1 — p) = 10. 


Link It 


The binomial distribution is used to compute probabilities for the count of successes among 
n observations that are produced under the binomial setting. An important situation for 
which the binomial setting can be used is when we choose an SRS from a population with 
a proportion p of successes. When the success probability p is known, probabilities associ- 
ated with the number of successes among n observations can be computed using either the 
binomial formula, or the Normal approximation when the sample size is large enough. 

An important application of the binomial distribution is in making inferences about 
the proportion of some outcome in a population. This is described in Chapter 22 where our 
data is collected under the binomial setting, but the proportion with the given outcome 
in the population is not known. For example, we may be interested in learning about the 
proportion of young adults (aged 19 to 25) that still live at home with their parents based 
on a sample from the population of young adults. When we are interested in making infer- 
ences about an unknown proportion in a population, we generally work with the proportion 
of successes in the sample rather than the count of successes in the sample. When using the 
proportion of successes in the sample to answer questions and draw conclusions about this 
unknown proportion in the population, the statistical methods are still related to the bino- 
mial distribution and the Normal approximation to the binomial. 


(4) LaunchPad Online Resources 
If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


* StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 

* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 


Summary 


339 


340 CHAPTER 14 @ Binomial Distributions 


| CHECK YOUR SKILLS 


14.13 Larry reads that one out of four eggs contains salmo- 
nella bacteria. So he never uses more than three eggs 
in cooking. If eggs do or don’t contain salmonella 
independent of each other, the number of contami- 
nated eggs when Larry uses three chosen at random 
has the distribution 

(a) binomial with n = 4 and p = 1/4. 

(b) binomial with n = 3 and p = 1/4. 

(c) binomial with n = 3 and p = 1/3. 

In Exercise 14.13, the probability that at least one of 
Larry’s three eggs contains salmonella is about 


(a) 0.68. (b) 0.58. (c) 0.30. 


In a group of 10 college students, four are business 
majors. You choose three of the 10 students at ran- 
dom and ask their major. The distribution of the 
number of business majors you choose is 

(a) binomial with n = 10 and p = 0.4. 

(b) binomial with n = 3 and p = 0.4. 


(c) not binomial. 


14.14 


14.15 


In a test for ESP (extrasensory perception), a subject is told that 
cards the experimenter can see but he cannot contain either a star, a 
circle, a wave, or a square. As the experimenter looks at each of five 
cards in turn, the subject names the shape on the card. A subject 
who is just guessing has probability 0.25 of guessing correctly on 
each card. Exercises 14.16 to 14.18 use this information. 
14.16 Ifthe subject guesses two shapes correctly and three 
incorrectly, in how many ways can you arrange the 
sequence of correct and incorrect guesses? 


| CHAPTER 14 EXERCISES 


14.22 Binomial setting? In each of the following situa- 
tions, is it reasonable to use a binomial distribution 
for the random variable X? Give reasons for your an- 
swer in each case. 


(a) An auto manufacturer chooses one car from 
each hour’s production for a detailed quality inspec- 
tion. One variable recorded is the count X of finish 
defects (dimples, ripples, etc.) in the car’s paint. 

(b) The pool of potential jurors for a murder case 
contains 100 persons chosen at random from the 
adult residents of a large city. Each person in the 
pool is asked whether he or she opposes the death 
penalty; X is the number who say “Yes.” 

(c) Joe buys a ticket in his state’s Pick 3 lottery 
game every week; X is the number of times in a year 
that he wins a prize. 


14.23 Binomial setting? A binomial distribution will be 


approximately correct as a model for one of these 


aes w=» (=. 


Assume the subject’s guesses are independent of 
each other. The probability that the subject guesses 
the shape correctly on the first and last card, but 
incorrectly on the other three cards, is about 


(a) 0.264. (b) 0.026. (c) 0.090. 


Assume the subject’s guesses are independent of 
each other. The probability that the subject guesses 
the shape correctly on two out of the five cards is 
about 


(a) 0.264. (b) 0.026. (c) 0.090. 


Each entry in a table of random digits like Table B has probability 
0.1 of being any of the 10 digits 0 to 9, and digits are independent of 
each other. Exercises 14.19 to 14.21 use this setting. 


14.17 


14.18 


14.19 The probability of an entry being either an 8 or a 9 is 
(a) 0.1. (b) 0.2. (c) 0.4. 
Each line in Table B has 40 digits. The number of 
times an 8 or a 9 occurs in two lines of the table is 

(a) binomial with n = 80 and p = 0.2. 

(b) binomial with n = 80 and p = 0.1. 

(c) binomial with n = 40 and p = 0.2. 

The mean number of times an 8 or a 9 occurs in two 
lines of the table is 


(a) 16. (b) 12.8. 


14.20 


14.21 


(c) 8. 


two sports set- 
tings and not 
for the other. 
Explain why 
by briefly dis- , aT 
cussing both .) fir ki 

settings. a. 


(a) A National 
Football League kicker has made 80% of his field 
goal attempts in the past. This season he attempts 
20 field goals. The attempts differ widely in distance, 
angle, wind, and so on. 

(b) A National Basketball Association player has 
made 80% of his free-throw attempts in the past. 
This season he takes 150 free throws. Basketball 
free throws are always attempted from 15 feet away 
from the basket with no interference from other 
players. 


© David Bergman/Corbis 


14.24 


14.25 


14.26 


Antibiotic resistance. Antibiotic resistance occurs 
when disease-causing microbes become resistant to 
antibiotic drug therapy. Because this resistance is 
typically genetic and transferred to the next genera- 
tions of microbes, it is a very serious public health 
problem. According to the CDC, approximately 
30% of gonorrhea cases tested in 2011 were resistant 
to at least one of the three major antibiotics com- 
monly used to treat gonorrhea.® A physician treated 
10 cases of gonorrhea during one week of 2011. 


(a) What is the distribution of the cases resistant to 
at least one of the three major antibiotics? 

(b) What is the probability that exactly one out 

of the 10 cases was resistant to at least one of the 
three major antibiotics? What is the probability for 
exactly two out of 10? 

(c) Also find the probability that one or more out of 
the 10 were resistant to at least one of the three major 
antibiotics. (Hint: It is easier to first find the probabil- 
ity that exactly zero of the 10 cases were resistant.) 
Random stock prices. A believer in the random 
walk theory of stock markets thinks that an index 
of stock prices has probability 0.65 of increasing in 
any year. Moreover, the change in the index in any 
given year is not influenced by whether it rose or fell 
in earlier years. Let X be the number of years among 
the next five years in which the index rises. 


(a) X has a binomial distribution. What are n and p? 
(b) What are the possible values that X can take? 
(c) Find the probability of each value of X. Draw 

a probability histogram for the distribution of X. 
(See Figure 14.2 (page 334) for an example of a 
probability histogram.) 

(d) What are the mean and standard deviation of 
this distribution? Mark the location of the mean on 
your histogram. 

Roulette—betting on red. A roulette wheel has 38 
slots, numbered 0, 00, and 1 to 36. The slots 0 and 
00 are colored green, 18 of the others are red, and 
18 are black. The dealer spins the wheel and at the 
same time rolls a small ball along the wheel in the 
opposite direction. The wheel is carefully balanced 
so that the ball is equally likely to land in any slot 
when the wheel slows. Gamblers can bet on various 
combinations of numbers and colors. 

(a) If you bet on “red,” you win if the ball lands in 
ared slot. What is the probability of winning with a 
bet on red in a single play of roulette? 

(b) You decide to play roulette four times, each 
time betting on red. What is the distribution of X, 
the number of times you win? 

(c) If you bet the same amount on each play and win 

on exactly two of the four plays, then you will “break 
even.” What is the probability that you will break even? 


14.27 


14.28 


14.29 


Exercises 


341 


(d) If you win on fewer than two of the four plays, 
then you will lose money. What is the probability 
that you will lose money? 


The birth control shot. The birth control shot is 
one of the most effective methods of birth control 
available, and it works best when you get the shot 
regularly, every 12 weeks. Under ideal conditions, 
only 1% of women taking the shot become pregnant 
within one year. In typical use, however, 3% become 
pregnant.’ Choose at random 20 women using the 
shot as their method of birth control. We count the 
number who become pregnant in the next year. 


Urbano Delvalle/The LIFE Images 
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(a) Explain why this is a binomial setting. 

(b) What is the probability that at least one of the 
women becomes pregnant under ideal conditions? 
What is the probability in typical use? 


Roulette, continued. You decide to play roulette 
200 times, each time betting the same amount on 
red. You will lose money if you win on fewer than 
100 of the plays. Based on the information in 
Exercise 14.26, what is the probability that you will 
lose money? (Check that the Normal approximation 
is permissible and use it to find this probability. If 
your software allows, find the exact binomial prob- 
ability and compare the two results.) In general, if 
you bet the same amount on red every time, you will 
lose money if you win on fewer than half of the plays. 
What do you think happens to the probability of 
making money the longer you continue to play? 


The birth control shot, continued. A study of the 
effectiveness of the birth control shot interviews a 
random sample of 600 women who are using the shot 
as their method of birth control. 


(a) Based on the information about typical use in 
Exercise 14.27, what is the probability that at least 
20 of these women become pregnant in the next 
year? (Check that the Normal approximation is 
permissible, and use it to find this probability. If 
your software allows, find the exact binomial prob- 
ability, and compare the two results.) 

(b) We can’t use the Normal approximation to the 
binomial distributions to find this probability under ideal 
conditions as described in Exercise 14.27. Why not? 
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Hitting the fairway. One statistic used to assess pro- 
fessional golfers is driving accuracy, the percent of 
drives that land in the fairway. In 2013, driving accu- 
racy for PGA Tour professionals ranged from about 
45% to about 70%. Tiger Woods, the highest money 
winner on the PGA tour in 2013, only hits the fair- 
way about 63% of the time.’ Although Woods’s av- 
erage driving distance has decreased almost 10 yards 
over the last 10 years, his percent of drives that hit 
the fairway has increased about 7%, which is not sur- 
prising as increased distance is generally associated 
with decreased accuracy. 


(a) Woods hits 14 drives in a round. What assump- 
tions must you make to use a binomial distribution 
for the count X of fairways he hits? Which of these 
assumptions is least realistic? 

(b) Assuming that a binomial distribution can be 
used, what is the expected number of fairways that 
Woods hits in a round in which he hits 14 drives? 


Genetics. According to genetic theory, the blossom 
color in the second generation of a certain cross of 
sweet peas should be red or white in a 3:1 ratio. That 
is, each plant has probability 3/4 of having red blos- 
soms, and the blossom colors of separate plants are 
independent. 


© blickwinkel/Alamy 


(a) What is the probability that exactly three out of 
four of these plants have red blossoms? 

(b) What is the mean number of red-blossomed 
plants when 60 plants of this type are grown from 
seeds? 

(c) What is the probability of obtaining at least 

45 red-blossomed plants when 60 plants are grown 
from seeds? Use the Normal approximation. If your 
software allows, find the exact binomial probability 
and compare the two results. 

False positives in testing for HIV. A rapid test for 
the presence in the blood of antibodies to HIV, the 
virus that causes AIDS, gives a positive result with 
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probability about 0.004 when a person who is free of 
HIV antibodies is tested. A clinic tests 1000 people 
who are all free of HIV antibodies. 


(a) What is the distribution of the number of 
positive tests? 

(b) What is the mean number of positive tests? 

(c) You cannot safely use the Normal approximation 
for this distribution. Explain why. 


Hyundai sales in 2013. Hyundai Motor America sold 
720,783 vehicles in the U.S. in 2013, with the U.S.- 
built Elantra leading sales with 247,912 cars sold, up 
23% from 2012. The other top selling nameplates in 
2013 were the Sonata with 203,648 sold, the Santa Fe 
with 88,844 sold, and the Accent with 60,458 sold.’ 
The company wants to undertake a survey of 2013 
Hyundai buyers to ask them about satisfaction with 
their purchase. 


(a) What proportion of the Hyundais sold in 2013 
were Elantras? 

(b) If they plan to survey a simple random sample of 
1200 Hyundai buyers, what is the expected number 
and standard deviation of the number of Elantra 
buyers in the sample? 

(c) What is the probability they will get fewer than 
400 Elantra buyers in their sample? 


Retention rates in a weight loss program. Ameri- 
cans spend over $30 billion annually on a variety of 
weight loss products and services. In a study of re- 
tention rates of those using the Rewards Program at 
Jenny Craig in 2005, it was found that about 18% 
of those who began the program dropped out in the 
first four weeks.'° Assume we have a random sample 
of 300 people beginning the program. 


(a) What is the mean number of people who would 
drop out of the Rewards Program within four weeks 
in a sample of this size? What is the standard 
deviation? 

(b) What is the approximate probability that at 
least 235 people in the sample will still be in the 
Rewards Program after the first four weeks? 


Multiple-choice tests. Here is a simple probability 
model for multiple-choice tests. Suppose each stu- 
dent has probability p of correctly answering a ques- 
tion chosen at random from a universe of possible 
questions. (A strong student has a higher p than a 
weak student.) Answers to different questions are 
independent. 


(a) Stacey is a good student for whom p = 0.75. 
Use the Normal approximation to find the 
probability that Stacey scores between 70% and 
80% on a 100-question test. 

(b) If the test contains 250 questions, what is the 
probability that Stacey will score between 70% and 
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80%? You see that Stacey’s score on the longer test 
is more likely to be close to her “true score.” 


Is this coin balanced? While he was a prisoner of 
war during World War II, John Kerrich tossed a coin 
10,000 times. He got 5067 heads. If the coin is per- 
fectly balanced, the probability of a head is 0.5. Is 
there reason to think that Kerrich’s coin was not bal- 
anced? To answer this question, find the probability 
that tossing a balanced coin 10,000 times would give 
a count of heads at least this far from 5000 (that is, 
at least 5067 heads or no more than 4933 heads.) 


Binomial variation. Never forget that probability 
describes only what happens in the long run. Exam- 
ple 14.5 concerns the count of unmarketable toma- 
toes in inspection samples of size 10. The count has 
the binomial distribution with n = 10 and p = 0.11. 
The Probability applet simulates inspecting an SRS 
of 10 tomatoes if you set the probability of heads to 
0.11, toss 10 times, and let each head stand for an 
unmarketable tomato. 


(a) The mean number of unmarketable tomatoes in 
a sample is 1.1. Click “Toss” and “Reset” repeatedly 
to simulate 20 samples. How many unmarketable 
tomatoes did you find in each sample? How close to 
the mean 1.1 is the average number of unmarket- 
able tomatoes in these samples? 

(b) Example 14.5 shows that the probability of ex- 
actly one unmarketable tomato is 0.3854. How close 
to the probability is the proportion of the 20 samples 
that have exactly one unmarketable tomato? 


Whooping cough. Whooping cough (pertussis) is a highly conta- 
gious bacterial infection that was a major cause of childhood deaths 
before the development of vaccines. About 80% of unvaccinated 
children who are exposed to whooping cough will develop the infec- 
tion, as opposed to only about 5% of vaccinated children. Exercises 
14.38 to 14.41 are based on this information. 


14.38 
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Vaccination at work. A group of 20 children at a 
nursery school are exposed to whooping cough by 
playing with an infected child. 


(a) If all 20 have been vaccinated, what is the mean 
number of new infections? What is the probability 
that no more than two of the 20 children develop 
infections? 

(b) If none of the 20 have been vaccinated, what 

is the mean number of new infections? What is 

the probability that 18 or more of the 20 children 
develop infections? 


A whooping cough outbreak. In 2007, Bob Jones 
University in Greenville, SC, ended its fall semester 
a week early because of a whooping cough outbreak; 
158 students were isolated and another 1200 given 
antibiotics as a precaution.'! Authorities react strongly 
to whooping cough outbreaks because the disease 
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is so contagious. Because the effect of childhood 
vaccination often wears off by late adolescence, treat 
the Bob Jones students as if they were unvaccinated. 
It appears that about 1400 students were exposed. 
What is the probability that at least 75% of these stu- 
dents develop infections if not treated? (Fortunately, 
whooping cough is much less serious after infancy.) 


A mixed group: means. A group of 20 children at 
a nursery school are exposed to whooping cough by 
playing with an infected child. Of these children, 17 
have been vaccinated and three have not. 

(a) What is the distribution of the number of new 
infections among the 17 vaccinated children? What 
is the mean number of new infections? 

(b) What is the distribution of the number of new 
infections among the three unvaccinated children? 
What is the mean number of new infections? 

(c) Add your means from parts (a) and (b). This 

is the mean number of new infections among all 

20 exposed children. 

A mixed group: probabilities. We would like to find 
the probability that exactly two of the 20 exposed 
children in Exercise 14.40 develop whooping cough. 
(a) One way to get two infections is to get one 
among the 17 vaccinated children and one among 
the three unvaccinated children. Find the probability 
of exactly one infection among the 17 vaccinated 
children. Find the probability of exactly one infec- 
tion among the three unvaccinated children. These 
events are independent: what is the probability of 
exactly one infection in each group? 

(b) Write down all the ways in which two infec- 
tions can be divided between the two groups of 
children. Follow the pattern of part (a) to find the 
probability of each of these possibilities. Add all 

of your results (including the result of part (a)) 

to obtain the probability of exactly two infections 
among the 20 children. 

Estimation 7 from random numbers. Kenyon Col- 
lege student Eric Newman used basic geometry to 
evaluate software random number generators as part 
of a summer research project. He generated 2000 in- 
dependent random points (X, Y) in the unit square. 
(That is, X and Y are independent random numbers 
between 0 and 1, each having the density function 
illustrated in Figure 12.4 (page 290). The probability 
that (X, Y) falls in any region within the unit square 
is the area of the region.) ” 

(a) Sketch the unit square, the region of possible 
values for the point (X, Y). 

(b) The set of points (X, Y) where X? + Y’ < 1 
describes a circle of radius 1. Add this circle to your 
sketch in part (a), and label the intersection of the 
two regions A. 
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(c) Let T be the total number of the 2000 points 
that fall into the region A. T follows a binomial 
distribution. Identify n and p. (Hint: Recall that the 
area of a circle is wr’). 

(d) What are the mean and standard deviation of T? 
(e) Explain how Eric used a random number 
generator and the facts given here to estimate 7. 


The continuity correction. One reason why the 
Normal approximation may fail to give accurate 
estimates of binomial probabilities is that the bi- 
nomial distributions are discrete and the Normal 
distributions are continuous. That is, counts take 
only whole number values, but Normal variables 
can take any value. We can improve the Normal 
approximation by treating each whole number 
count as if it occupied the interval from 0.5 below 
the number to 0.5 above the number. For example, 


Exploring the Web 


approximate a binomial probability P(X = 10) by 
finding the Normal probability P(X = 9.5). Be care- 
ful: binomial P(X > 10) is approximated by Normal 
P(X = 10.5). 

We saw in Exercise 14.30 that Tiger Woods hits 
the fairway in 63% of his drives. We will assume that 
his drives are independent and that each has prob- 
ability 0.63 of hitting the fairway. Suppose Woods 
drives 30 times. The exact binomial probability that 
he hits 23 or more fairways is 0.084. 


(a) Show that this setting satisfies the rule of thumb 
for use of the Normal approximation (just barely). 
(b) What is the Normal approximation to 
P(X = 23)? 

(c) What is the Normal approximation using the 
continuity correction? That’s a lot closer to the 

true binomial probability. 


14.44 MCAT physical science score. Go to the website www. aamc.org/students/ 
applying/mcat/data/mcat_stats/, which reports the distribution of the 
total scaled MCAT composite score as well as the scores on the individual areas 
of assessment for recent years. The areas of assessment include three multiple- 
choice portions scored on a 15-point scale, which are combined to give the overall 


composite score. 


(a) Open the PDF with the percentage and scaled score tables for the most 
recent year provided. Find the percentage of students who obtained a physical 
science section score of 9 or greater. 


(b) A survey organization is planning on contacting an SRS of 300 of the 
examinees from the most recent year to see how they prepared for the MCAT 
exam. What is the distribution of the number of examinees in the sample that had 
a physical science section score of 9 or greater? 


(c) Use the Normal distribution to approximate the probability that at least 
half of the examinees in the sample had a physical science section score of 9 or 


greater. 


(d) Use software or a calculator to compute the exact probability that at 
least half of the examinees in the sample had a physical science section score of 
9 or greater. How do the exact probability and the Normal approximation to this 
probability compare? Which of the two answers would you report to the survey 


organization? Why? 


14.45 Binomial calculators. A number of websites will do exact binomial probability 


calculations for you. 


(a) Find a website with a binomial calculator and give its URL. 


(b) In Example 14.7, the number of people in the sample who track their 
health is a binomial random variable X with n = 3014 and p = 0.62. Use the 
binomial calculator to compute the probability that 1900 or more of the sample 


agree. 
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hat is the average income of American households? Each March, the gov- 

ernment’s Current Population Survey asks detailed questions about income. 

The 98,095 households contacted in March 2013 had a mean “total money 
income” of $71,274 in 2012.! (The median income was of course lower, $51,017.) 
That $71,274 describes the sample, but we use it to estimate the mean income of all 
households. This is an example of statistical inference: we use information from a 
sample to infer something about a wider population. 

Because the results of random samples and randomized comparative experiments 
include an element of chance, we can’t guarantee that our inferences are correct. 
What we can guarantee is that our methods usually give correct answers. The 
reasoning of statistical inference rests on asking, “How often would this method 
give a correct answer if I used it very many times?” If our data come from random 
sampling or randomized comparative experiments, the laws of probability answer 
the question “What would happen if we did this many times?” This chapter presents 
some facts about probability that help answer this question. 


In this chapter 
we cover... 


15.1 Parameters and statistics 


15.2 Statistical estimation and the 
law of large numbers 


15.3. Sampling distributions 


15.4 The sampling distribution 
of xX 


15.5 The central limit theorem 


15.6 Sampling distributions and 
statistical significance 
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As we begin to use sample data to draw conclusions about a wider population, we 
must take care to keep straight whether a number describes a sample or a popula- 
tion. Here is the vocabulary we use. 


Parameter, Statistic 


A parameter is a number that describes the population. In practice, the value of a 
parameter is not known because we can rarely examine the entire population. 


A statistic is a number that can be computed from the sample data without making 
use of any unknown parameters. In practice, we often use a statistic to estimate an 
unknown parameter. 


The mean income of the sample of 98,095 households contacted by the Current 
Population Survey was x = $71, 274. The number $71,274 is a statistic because it 
describes this one Current Population Survey sample. The population that the poll wants 
to draw conclusions about is all 122 million U.S. households. The parameter of interest is 
the mean income of all these households. We don’t know the value of this parameter. m 


Remember s and p: statistics come from samples, and parameters come from popula- 
tions. As long as we were just doing data analysis, searching for patterns, or summarizing 
features of our data, the distinction between population and sample was not important. 
Now, as we begin to understand what our data (sample) tell us about a population, it 
is essential. The notation we use must reflect this distinction. We write ps (the Greek 
letter mu) for the population mean and o@ (the Greek letter sigma) for the population 

population mean » __ standard deviation. These are fixed parameters that are unknown when we use a 

sample for inference. The sample mean is the familiar x, the average of the observations 

in the sample. The sample standard deviation is denoted by s, the standard deviation 

of the observations in the sample. These are statistics that would almost certainly take 

sample standard deviation s different values if we chose another sample from the same population. The sample mean 
x and sample standard deviation s from a sample or an experiment are estimates of the 
mean p and standard deviation o of the underlying population. 


() LaunchPad Online Resources 


* The StatClips video, Introduction to Statistics—Populations, Parameters, 
Samples, and Sample Statistics, discusses the meaning of these four terms. 


Apply Your Knowledge 


15.1 Genetic Engineering. Here’s an idea for treating advanced melanoma, the 
most serious kind of skin cancer: genetically engineer white blood cells to bet- 
ter recognize and destroy cancer cells, then infuse these cells into patients. The 
subjects in a small initial study of this approach were 11 patients whose mela- 
noma had not responded to existing treatments. One outcome of this experi- 
ment was measured by a test for the presence of cells that trigger an immune 
response in the body and so may help fight cancer. The mean counts of active 
cells per 100,000 cells for the 11 subjects were 3.8 before infusion and 160.2 
after infusion. Is each of the boldface numbers a parameter or a statistic? 


population standard deviation o 


sample mean xX 
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15.2 Florida Voters. Florida played a key role in recent presidential elections. 
Voter registration records in February 2014 show that 39% of Florida voters 
are registered as Democrats and 35% as Republicans. (Most of the others did 
not choose a party.) To test a random digit dialing device that you plan to use 
to poll voters for the 2014 Senate elections, you use it to call 250 randomly 
chosen residential telephones in Florida. Of the registered voters contacted, 
34% are registered Democrats. Is each of the boldface numbers a parameter 
or a statistic? 


15.3 Steroid Use. Researchers surveyed 500 American anabolic androgenic ste- 
roid users, ranging in age from 16 to 62, and found that 98.8% of them were 
male. The proportion of all Americans between the ages of 16 and 62 who are 
male is 50.0%. The median age at which those surveyed began using steroids 
was 22. Is each of the boldface numbers a parameter or a statistic? 


Statistical inference uses sample data to draw conclusions about the entire popu- 
lation. Because good samples are chosen randomly, statistics such as x computed 
from these samples are random variables. We can describe the behavior of a sample 
statistic by a probability model that answers the question, “What would happen if 
we did this many times?” Here is an example that will lead us toward the probability 
ideas most important for statistical inference. 


ay HIGH-TECH 
GAMBLING 

There are twice as many slot 
machines as bank ATMs in the 
United States. Once upon a time, 

you put in a coin and pulled the 

lever to spin three wheels, each with 
20 symbols. No longer. Now the 
machines are video games with flashy 
graphics and outcomes produced by 
random number generators. Machines 
can accept many coins at once, can 
pay off on a bewildering variety of 
outcomes, and can be networked to 
allow common jackpots. Gamblers still 
search for systems, but in the long run 
the law of large numbers guarantees 
the house its 5% profit. 


One of the reasons that winemaking is referred to as an art is because so many 
things can go wrong during production. Wine is chemically delicate and must be care- 
fully supervised and nurtured. Sulfur compounds such as dimethyl sulfide (DMS) are 
formed naturally in the process of making wine. It is present in all wines. At low levels, 
it contributes to roundness, fruitiness, and the complexity of wine. DMS increases 
with wine age, and can lead to interesting sensory characteristics, such as increased 
complexity. Unfortunately, at higher levels it can contribute a vegetative, cooked cab- 
bage, onion-like, or sulfur smell. In restaurants, when you order a bottle of wine, you 
are often presented with a small sample of the newly opened bottle to confirm that 
there are no unpleasant odors. 

Winemakers need to control DMS in wine and understand at what levels it causes 
a wine to smell bad. Thus, winemakers want to know the “odor threshold,” the lowest 
concentration of DMS that the human nose can detect. People vary in their ability 
to detect DMS, and it is important to understand this variation. For example, wine 
aficionados often have sensitive palates and may have a lower-than-average odor 
threshold. 

Because different people have different thresholds, we start by asking about the 
mean threshold yw in the population of all adults. The number yw is a parameter that 
describes this population. 

To estimate jz, we present tasters with both natural wine and the same wine 
spiked with DMS at different concentrations to find the lowest concentration at which 
they identify the spiked wine. Here are the odor thresholds (measured in micrograms 
of DMS per liter of wine) for 10 randomly chosen subjects: 


28 40 28 33 20 31 29 27 17 21 


Jupiterimages/Getty Images 


348 CHAPTER 15 @ Sampling Distributions 


The mean threshold for these subjects is x = 27.4. It seems reasonable to use the 
sample result x = 27.4 to estimate the unknown yw. An SRS should fairly represent 
the population, so the mean x of the sample should be somewhere near the mean 
pw of the population. Of course, we don’t expect x to be exactly equal to uw. We 
realize that if we choose another SRS, the luck of the draw will probably produce 
a different x. m 


If x is rarely exactly right and varies from sample to sample, why is it nonetheless 
a reasonable estimate of the population mean w? Here is one answer: if we keep on 
taking larger and larger samples, the statistic X is guaranteed to get closer and closer to 
the parameter ju. We have the comfort of knowing that if we can afford to keep on 
measuring more subjects, eventually we will estimate the mean odor threshold of all 
adults very accurately. This remarkable fact is called the law of large numbers. It is 
remarkable because it holds for any population, not just for some special class such 
as Normal distributions. 


Law of Large Numbers 


Draw observations at random from any population with finite mean ww. As the num- 
ber of observations drawn increases, the mean x of the observed values tends to get 
closer and closer to the mean p of the population. 


The law of large numbers can be proven mathematically starting from the 
basic laws of probability. The behavior of x is similar to the idea of probability. 
In the long run, the proportion of outcomes taking any value gets close to the 
probability of that value, and the average outcome gets close to the population 
mean. Figure 12.1 (page 279) shows how proportions approach probability in 
one example. Here is an example of how sample means approach the popula- 
tion mean. 


Suppose the distribution of odor thresholds among all adults has mean 25. The mean 
ft = 25 is the true value of the parameter we seek to estimate. Figure 15.1 shows 
how the sample mean x of an SRS drawn from this population changes as we add 
more subjects to our sample. 

The first subject in Example 15.2 had threshold 28, so the line in Figure 15.1 
starts there. The mean for the first two subjects is 


28 +40 _ 


34 
2 


Y= 

This is the second point on the graph. At first, the graph shows that the mean of 
the sample changes as we take more observations. Eventually, however, the mean of 
the observations gets close to the population mean w = 25 and settles down at that 
value. 

If we started over, again choosing people at random from the population, we 
would get a different path from left to right in Figure 15.1. The law of large numbers 
says that whatever path we get will always settle down at 25 as we draw more and 
more people. = 
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FIGURE 15.1 

The law of large numbers in action: as 
we take more observations, the sample 
mean X always approaches the mean mw 
of the population. 


Mean of first n observations 
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The Law of Large Numbers applet animates Figure 15.1 in a different setting. 
You can use the applet to watch x change as you average more observations until it 
eventually settles down at the mean wp. 

The law of large numbers is the foundation of such business enterprises as gam- 
bling casinos and insurance companies. The winnings (or losses) of a gambler on a 
few plays are uncertain—that’s why some people find gambling exciting. In Figure 
15.1, the mean of even 100 observations is not yet very close to m. It is only in the 
long run that the mean outcome is predictable. The house plays tens of thousands 
of times. So the house, unlike individual gamblers, can count on the long-run regu- 
larity described by the law of large numbers. The average winnings of the house 
on tens of thousands of plays will be very close to the mean of the distribution of 
winnings. Needless to say, this mean guarantees the house a profit. That’s why 
gambling can be a business. 


Apply Your Knowledge 


15.4 The Law of Large Numbers Made Visible. Roll two balanced dice and 
count the total spots on the up-faces. The probability model appears in 
Example 12.5 (page 283). You can see that this distribution is symmetric with 7 
as its center, so it’s no surprise that the mean is p = 7. This is the population 
mean for the idealized population that contains the results of rolling two dice 
forever. The law of large numbers says that the average x from a finite number 
of rolls tends to get closer and closer to 7 as we do more and more rolls. 


(a) Click “More dice” once in the Law of Large Numbers applet to get two 
dice. Click “Show mean” to see the mean 7 on the graph. Leaving the 
number of rolls at 1, click “Roll dice” three times. How many spots did 
each roll produce? What is the average for the three rolls? You see that 
the graph displays at each point the average number of spots for all 
rolls up to the last one. This is exactly like Figure 15.1. 
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(b) Click “Reset” to start over. Set the number of rolls to 100 and click 
“Roll dice.” The applet rolls the two dice 100 times. The graph shows 
how the average count of spots changes as we make more rolls. That 
is, the graph shows Xx as we continue to roll the dice. Sketch (or print 
out) the final graph. 


(c) Repeat your work from part (b). Click “Reset” to start over, then roll 
two dice 100 times. Make a sketch of the final graph of the mean x 
against the number of rolls. Your two graphs will often look very dif- 
ferent. What they have in common is that the average eventually gets 
close to the population mean yz = 7. The law of large numbers says 
that this will always happen if you keep on rolling the dice. 

15.5 Insurance. The idea of insurance is that we all face risks that are unlikely 
but carry high cost. Think of a fire or flood destroying your apartment. Insur- 
ance spreads the risk: we all pay a small amount, and the insurance policy 
pays a large amount to those few of us whose apartments are damaged. An 
insurance company looks at the records for millions of apartment owners and 
sees that the mean loss from apartment damage in a year is w = $125 per 
person. (Most of us have no loss, but a few lose most of their possessions. The 
$125 is the average loss.) The company plans to sell renters insurance for 
$125 plus enough to cover its costs and profit. Explain clearly why it would 
be unwise to sell only 12 policies. Then explain why selling thousands of such 
policies is a safe business. 


The law of large numbers assures us that if we measure enough subjects, the 
statistic x will eventually get very close to the unknown parameter p. But the 
odor threshold study in Example 15.2 had just 10 subjects. What can we say about 
estimating wz by x from a sample of 10 subjects? Put this one sample in the context 
of all such samples by asking, “What would happen if we took many samples of 
10 subjects from this population?” Here’s how to answer this question: 


Take a large number of samples of size 10 from the population. 
Calculate the sample mean x for each sample. 


Make a histogram of the values of x. 


Examine the shape, center, and variability of the distribution displayed in the 
histogram. 


In practice it is too expensive to take many samples from a large population such 
as all adult U.S. residents. But we can imitate many samples by using software. 
Using software to imitate chance behavior is called simulation. 


Extensive studies have found that the DMS odor threshold of adults follows roughly 
a Normal distribution with mean ww = 25 micrograms per liter and standard devia- 
tion o@ = 7 micrograms per liter. We call this the population distribution of odor 
threshold. 

Figure 15.2 illustrates the process of choosing many samples and finding the 
sample mean threshold x for each one. Follow the flow of the figure from the popu- 
lation at the left, to choosing an SRS and finding the x for this sample, to collecting 


15.3 


Take many SRSs and collect 
their means x. 


The distribution of all 
the X's 1s close to Normal. 


SRS size 10 


—— X = 26.42 


SRS size 10 


———> X = 24.28 


SRS size 10 
——sSsS——— 


Population, 20 
mean m = 25 


FIGURE 15.2 


Sampling Distributions 


The idea of a sampling distribution: take many samples from the same population, collect the x's from all the samples, and display the 


distribution of the x's. The histogram shows the results of 1000 samples. 


together the x’s from many samples. The first sample has xX = 26.42. The second 
sample contains a different 10 people, with x = 24.28, and so on. The histogram at 
the right of the figure shows the distribution of the values of x from 1000 separate 
SRSs of size 10. This histogram displays the sampling distribution of the statistic X. m 


Population Distribution, Sampling Distribution 


The population distribution of a variable is the distribution of values of the vari- 
able among all the individuals in the population. 

The sampling distribution of a statistic is the distribution of values taken by the 
statistic in all possible samples of the same size from the same population. 


make up the population. A sampling distribution describes how a statistic 
varies in many samples from the population. 

Strictly speaking, the sampling distribution is the ideal pattern that would emerge 
if we looked at all possible samples of size 10 from our population. A distribution 
obtained from a fixed number of trials, like the 1000 trials in Figure 15.2, is only an 
approximation to the sampling distribution. One of the uses of probability theory in 
statistics is to obtain sampling distributions without simulation. The interpretation 
of a sampling distribution is the same, however, whether we obtain it by simulation 
or by the mathematics of probability. 

We can use the tools of data analysis to describe any distribution. Let’s apply 
those tools to Figure 15.2. What can we say about the shape, center, and variability 
of this distribution? 


A Be careful: The population distribution describes the individuals that 


= Shape: It looks Normal! Detailed examination confirms that the distribution 
of x from many samples is very close to Normal. 
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m Center: The mean of the 1000 x’s is 24.95. That is, the distribution is 
centered very close to the population mean = 25. 


= Variability: The standard deviation of the 1000 x’s is 2.217, notably smaller 
than the standard deviation o = 7 of the population of individual subjects. 


Although these results describe just one simulation of a sampling distribution, they 
reflect facts that are true whenever we use random sampling. 


Apply Your Knowledge 


15.6 Sampling Distribution versus Population Distribution. The 2012 
American Time Use Survey contains data on how many minutes of sleep 
per night each of 12,443 survey participants estimated they get.” The times 
follow the Normal distribution with mean 528.8 minutes and standard de- 
viation 137.2 minutes. An SRS of 100 of the participants has a mean time 
of x = 509.23 minutes. A second SRS of size 100 has mean x = 530.32 
minutes. After many SRSs, the many values of the sample mean x follow 
the Normal distribution with mean 528.8 minutes and standard deviation 
13.72 minutes. 


(a) What is the population? What values does the population distribution 
describe? What is this distribution? 


(b) What values does the sampling distribution of x describe? What is the 
sampling distribution? 


15.7 Generating a Sampling Distribution. Let’s illustrate the idea of a sam- 
pling distribution in the case of a very small sample from a very small popula- 
tion. The population is the scores of 10 students on an exam: 


Student 1 2 3 4 5 6 i 8 9 10 
Score 86 63 81 55 ie ie 65 66 75 59 


The parameter of interest is the mean score pw in this population. The sam- 
ple is an SRS of size n = 4 drawn from the population. The Simple Random 
Sample applet can be used to select simple random samples of four single digit 
numbers between | and 10, corresponding to the students. 


(a) Make a histogram of these 10 scores. 

(b) Find the mean of the 10 scores in the population. This is the popula- 
tion mean pL. 

(c) Use the Simple Random Sample applet to draw an SRS of size 4 from 
this population. What are the four scores in your sample? What is their 
mean x? This statistic is an estimate of jw. 

(d) Repeat this process nine more times, using the applet. Make a histogram 
of the 10 values of x. You are constructing the sampling distribution of 
x. Is the center of your histogram close to jz? How does the shape of this 
histogram compare with the histogram you made in part (a)? 


Figure 15.2 suggests that when we choose many SRSs from a population, the 
sampling distribution of the sample means is centered at the mean of the original 
population and is less variable (spread out) than the distribution of individual obser- 
vations. Here are the facts. 


15.4 The Sampling Distribution of x 


Mean and Standard Deviation of a Sample Mean? 


Suppose that x is the mean of an SRS of size n drawn from a large population with 
mean p and standard deviation a. Then the sampling distribution of x has mean 
and standard deviation a/ Vn. 


By “large population” we mean that the size of the population is much larger than 
the size of the sample—say, at least 20 times as large. 

These facts about the mean and the standard deviation of the sampling distribu- 
tion of x are true for any population, not just for some special class such as Normal 
distributions. They have important implications for statistical inference: 


= The mean of the statistic x is always equal to the mean pw of the population. 
That is, the sampling distribution of x is centered at wz. In repeated sampling, 
x will sometimes fall above the true value of the parameter and sometimes 
below, but there is no systematic tendency to overestimate or underestimate 
the parameter. This makes the idea of lack of bias in the sense of “no 
favoritism” more precise. Because the mean of x is equal to pz, we say that 
the statistic x is an unbiased estimator of the parameter pm. 


= An unbiased estimator is “correct on the average” in many samples. How 
close the estimator falls to the parameter in most samples is determined 
by the variability of the sampling distribution. If individual observations 
have standard deviation a, then sample means x from samples of size n 
have standard deviation o/ Vn. That is, averages are less variable than 
individual observations. 


= Not only is the standard deviation of the distribution of x smaller than the 
standard deviation of individual observations, but it gets smaller as we take 
larger samples. The results of large samples are less variable than the 
results of small samples. 


The upshot of all this is that we can trust the sample mean from a 
A large random sample to estimate the population mean accurately. If the 
sample size n is large, the standard deviation of x is small, and almost all 
samples will give values of x that lie very close to the true parameter jw. However, 
the standard deviation of the sampling distribution gets smaller only at the rate Vn. 
To cut the standard deviation of x in half, we must take four times as many observa- 
tions, not just twice as many. So very precise estimates (estimates with very small 
standard deviation) may be expensive. 

We have described the center and variability of the sampling distribution of a 
sample mean x, but not its shape. The shape of the sampling distribution depends 
on the shape of the population distribution. In one important case there is a simple 
relationship between the two distributions: if the population distribution is Normal, 
then so is the sampling distribution of the sample mean. 


Sampling Distribution of a Sample Mean 


If individual observations have the N(, o) distribution, then the sample mean x of 
an SRS of size n has the N(w, o/ Vn) distribution. 


Notice that if the population distribution is Normal, then the sampling distribu- 
tion of the sample mean is Normal regardless of the sample size n. 
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unbiased estimator 


ay SAMPLE SIZE 
MATTERS 


The new thing in baseball is using 
statistics to evaluate players, with 
new measures of performance to 

help decide which players are worth 
the high salaries they demand. This 
challenges traditional subjective 
evaluation of young players and the 
usefulness of traditional measures such 
as batting average. But success has 
led many major league teams to hire 
statisticians. The statisticians say that 
sample size matters in baseball also: 
the 162-game regular season is long 
enough for the better teams to come 
out on top, but five game and seven 
game play-off series are so short that 
luck has a lot to do with who wins. 
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FIGURE 15.3 

The distribution of single observations (the 
population distribution) compared with the 
sampling distribution of the means x of 10 


individual observations. 


observations, for Example 15.5. Both have the 
same mean, but averages are less variable than DMS odor threshold 


If we measure the DMS odor thresholds of individual adults (as described in 
Example 15.2), the values follow the Normal distribution with mean w= 25 micro- 
grams per liter and standard deviation @ = 7 micrograms per liter. This is the popu- 
lation distribution of odor threshold. 

Take many SRSs of size 10 from this population and find the sample mean x for 
each sample, as in Figure 15.2. The sampling distribution describes how the values 
of x vary among samples. That sampling distribution is also Normal, with mean 
b= 25 and standard deviation 


oe ee es 
Vn Vio 

Figure 15.3 contrasts these two Normal distributions. Both are centered at the popu- 
lation mean, but sample means are much less variable than individual observations. 


The sampling distribution 
describes how sample means 
X vary in repeated samples. 


The population distribution 


describes how individuals 
vary in the population. 


The smaller variation of sample means shows up in probability calculations. You 
can show (using software or standardizing and using Table A) that about 52% of 
all adults have odor thresholds between 20 and 30. But almost 98% of means of 
samples of size 10 lie in this range. m 


(4) LaunchPad Online Resources 


* The StatBoards video, x Is Normal for All Sample Sizes If the Population 
Is Normal, discusses examples of computing the sampling distribution of x 
when the population is normal. 


Apply Your Knowledge 


15.8 A Sample of Young Men. A government sample survey plans to measure 
the LDL (bad) cholesterol level of an SRS of men aged 20 to 34. The re- 
searchers will report the mean x from their sample as an estimate of the mean 
LDL cholesterol level yz in this population. 


15.5 The Central Limit Theorem 


(a) Explain to someone who knows no statistics what it means to say that 
x is an “unbiased” estimator of w. 


(b) The sample result x is an unbiased estimator of the population truth 
no matter what size SRS the study uses. Explain to someone who 
knows no statistics why a large sample gives more trustworthy results 
than a small sample. 


15.9 Larger Sample, More Accurate Estimate. Suppose that in fact the LDL 
cholesterol level of all men aged 20 to 34 follows the Normal distribution 
with mean x = 115 milligrams per deciliter (mg/dL) and standard deviation 


o = 25 mg/dL. 


(a) Choose an SRS of 100 men from this population. What is the sampling 
distribution of x? What is the probability that x takes a value between 
112 and 118 mg/dL? This is the probability that x estimates ps within 
+3 mg/dL. 

(b) Choose an SRS of 1000 men from this population. Now what is the 
probability that x falls within +3 mg/dL of jz? The larger sample is 
much more likely to give an accurate estimate of ju. 

15.10 Measurements in the Lab. Juan makes a measurement in a chemistry 
laboratory and records the result in his lab report. Suppose that if Juan makes 
this measurement repeatedly, the standard deviation of his measurements 
will be o = 10 milligrams. Juan repeats the measurement four times and 
records the mean x of his four measurements. 


(a) What is the standard deviation of Juan’s mean result? (That is, if Juan 
kept on making four measurements and averaging them, what would 
be the standard deviation of all his x values?) 

(b) How many times must Juan repeat the measurement to reduce the 
standard deviation of x to 2? Explain to someone who knows no sta- 
tistics the advantage of reporting the average of several measurements 
rather than the result of a single measurement. 


The facts about the mean and standard deviation of x are true no matter 
what the shape of the population distribution may be. But what is the shape of 
the sampling distribution when the population distribution is not Normal? It is a 
remarkable fact that as the sample size increases, the distribution of x changes shape: it 
looks less like that of the population and more like a Normal distribution. When the 
sample is large enough, the distribution of x is very close to Normal. This is true 
no matter what shape the population distribution has, as long as the population 
has a finite standard deviation a. This famous fact of probability theory is called 
the central limit theorem. It is much more useful than the fact that the distribution 
of x is exactly Normal if the population is exactly Normal. 


Central Limit Theorem 


Draw an SRS of size n from any population with mean py and finite standard devia- 
tion a. The central limit theorem says that when n is large, the sampling distribu- 
tion of the sample mean x is approximately Normal: 


Oo 


x is approximately N (u. =) 
n 
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ay WHAT WAS THAT 
PROBABILITY 
AGAIN? 


Wall Street uses fancy mathematics 
to predict the probabilities that 
fancy investments will go wrong. 
The probabilities are always too 
low—sometimes because something 
was assumed to be Normal but 

was not. Probability predictions in 
other areas also go wrong. In mid- 
September 2007, the New York Mets 
had probability 0.998 of making the 
National League play-offs, or so an 
elaborate calculation said. Then the 
Mets lost 12 of their final 17 games, 
the Phillies won 13 of their final 17, 
and the Mets were out. 
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The central limit theorem allows us to use Normal probability calculations to answer 
questions about sample means from many observations even when the population 
distribution is not Normal. 


More general versions of the central limit theorem say that the distribution of 
any sum or average of many small random quantities is close to Normal. This is true 
even if the quantities are correlated with each other (as long as they are not too 
highly correlated) and even if they have different distributions (as long as no one 
random quantity is so large that it dominates the others). The central limit theorem 
suggests why the Normal distributions are common models for observed data. Any 
variable that is a sum of many small influences will have approximately a Normal 
distribution. 

How large a sample size n is needed for x to be close to Normal depends on the 
population distribution. More observations are required if the shape of the popula- 
tion distribution is far from Normal. Here are two examples in which the population 
is far from Normal. 


FIGURE 15.4 

The central limit theorem in action, for 
Example 15.6. (a) The distribution 

of earned income in a population of 
73,769 households. (b) The distribution 
of the mean earnings for 500 SRSs of 
100 households each from this popula- 
tion. (c) The distribution of the sample 
means in more detail: the shape is close 
to Normal. 


In March 2013, the Current Population Survey contacted 98,095 households. 
Figure 15.4(a) is a histogram of the earnings of the 73,769 households that had 
earned income greater than zero in 2012.* As we expect, the distribution of earned 
incomes is strongly skewed to the right and very spread out. The right tail of the 
distribution is even longer than the histogram shows because there are too few high 
incomes for their bars to be visible on this scale. In fact, we cut off the earnings scale 
at $400,000 to save space—a few households earned even more than $400,000. 
The mean earnings for these 73,769 households was $74,621. 

Regard these 73,769 households as a population with mean w = $74,621. Take 
an SRS of 100 households. The mean earnings in this sample is X = $75,847. That’s 
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FIGURE 15.4 


305 (Continued) 


Because both histograms 
use the same scales, you 
can directly compare this 


N 
(Se) 
1 


with the histogram in 
Figure 15.4(a). 


Percent of samples 
i 
1 


ey 
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0 100 200 300 400 
Mean household earnings for samples of size 100 (thousands of dollars) 


(b) 


This is the same 
histogram pictured 

in Figure 15.4(b), drawn 
ina scale that more 
clearly shows its shape. 


Percent of samples 


50 60 70 80 90 100 
Mean household earnings for samples of size 100 (thousands of dollars) 


(c) 


higher than the mean of the population. Take another SRS of size 100. The mean for 
this sample is X = $66,860. That's less than the mean of the population. What would 
happen if we did this many times? Figure 15.4(b) is a histogram of the mean earnings 
for 500 samples, each of size 100. The scales in Figures 15.4(a) and 15.4(b) are the 
same, for easy comparison. Although the distribution of individual earnings is skewed 
and with large variability, the distribution of sample means is roughly symmetric and 
shows much less variability. 

Figure 15.4(c) zooms in on the center part of the histogram in Figure 15.4(b) to 
more clearly show its shape. Although n = 100 is not a very large sample size and 
the population distribution is extremely skewed, we can see that the distribution of 
sample means is close to Normal. m 
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Comparing Figure 15.4(a) with Figures 15.4(b) and 15.4(c) illustrates the two 
most important ideas of this chapter. 


Thinking about Sample Means 


Means of random samples are less variable than individual observations. 


Means of random samples are more Normal than individual observations. 


Exponential distributions are used as models for the lifetime of electronic components 
and for the time required to serve a customer or repair a machine. Figure 15.5(a) 
shows the exponential population distribution, that is, the density curve of a single 
observation. This distribution is strongly right-skewed, and the most probable out- 
comes are near O. The mean yp of this distribution is 1, and its standard deviation 
o is also 1. 


(a) (b) 


FIGURE 15.5 

The central limit theorem in action, 

or Example 15.7. The distribution 

of sample means x from a strongly 

non-Normal population becomes more 

ormal as the sample size increases. 

(a) The distribution of 1 observation. 

(b) The distribution of X for 2 observa- 

ions. (c) The distribution of X for 10 0) i 0 1 
observations. (d) The distribution of x (c) (d) 
or 25 observations. 


Mathematics can be used to derive the theoretical sampling distribution of x when 
sampling from an exponential distribution. Figures 15.5(b), 15.5(c), and 15.5(d) are 
the theoretical density curves of the sample means of 2, 10, and 25 observations 
from this population. As n increases, the shape becomes more Normal. The mean 
remains at w = 1, and the standard deviation decreases, taking the value 1/Vn. The 
density curve for 10 observations is still somewhat skewed to the right but already 
resembles a Normal curve having w = 1 and o = 1/V10 = 0.32. The density 
curve for n = 25 is yet more Normal. The contrast between the shapes of the popu- 
lation distribution and of the distribution of the mean of 10 or 25 observations is 
striking. m 
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The Central Limit Theorem applet allows you to watch the central limit theorem in 
action. The applet simulates the sampling distribution of x for several population 
distributions and you can see how the sampling distribution changes shape as the 
sample size increases. 

Let’s use Normal calculations based on the central limit theorem to answer a 
question about the very non-Normal distribution in Figure 15.5(a). 


nance on an air-conditioning unit is governed by the exponential distribution whose 
density curve appears in Figure 15.5(a). The exponential distribution arises in many 
engineering and industrial problems, such as time until failure of a machine or time 
until a success. The mean time is ~ = 1 hour and the standard deviation is 0 = 1 
hour. Your company has a contract to maintain 70 of these units in an apartment 
building. You must schedule technicians’ time for a visit to this building. Is it safe to 
budget an average of 1.1 hours for each unit? Or should you budget an average of 
1.25 hours? 


PLAN: We believe that the manufacturing and distribution process associated with 
this type of air-conditioning unit is such that variation from one unit to the next is 
random. Thus, we treat these 70 air conditioners as an SRS from all units of this type. 
What is the probability that the average maintenance time for 70 units exceeds 1.1 
hours? That the average time exceeds 1.25 hours? 


(STATE? The time (in hours) that a technician requires to perform preventive mainte- 42 


ESV The central limit theorem says that the sample mean time x spent working 
on 70 units has approximately the Normal distribution with mean equal to the popula- 
tion mean yw = 1 hour and standard deviation 


1 
= = 0.12 hour 

V70 V70 

The distribution of x is therefore approximately N(1, 0.12). This Normal curve is the 

solid curve in Figure 15.6. 


Exact density 
curve for x. 

Normal curve from the 
central limit theorem. 


FIGURE 15.6 

The exact distribution (dotted) and the 

Normal approximation from the central 

. limit theorem (solid) for the average 
ee time needed to maintain an air condi- 

11 tioner, for Example 15.8. The probability 

we want is the area to the right of 1.1. 
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Using this Normal distribution, the probabilities we want are 
P(x > 1.10 hours) = 0.2014 
P(x > 1.25 hours) = 0.0182 


Software gives these probabilities immediately, or you can standardize and use 
Table A. For example, 


ee 1j0=4 
P(x > 1.10) = p(X) > p(140—*) 
0.12 0.12 
= P(Z > 0.83) = 1 — 0.7967 = 0.2033 


with the usual roundoff error. Don’t forget to use standard deviation 0.12 in your 
software or when you standardize x. 


(SSSR |f you budget 1.1 hours per unit, there is a 20% chance that the techni- 
cians will not complete the work in the building within the budgeted time. This chance 
drops to 2% if you budget 1.25 hours. You should therefore budget 1.25 hours 
per unit. 


Using more mathematics, we can start with the exponential distribution and 
find the actual density curve of x for 70 observations. This is the dotted curve in 
Figure 15.6. You can see that the solid Normal curve is a good approximation. The 
exactly correct probability for 1.1 hours is an area to the right of 1.1 under the dot- 
ted density curve. It is 0.1977. The central limit theorem Normal approximation 
0.2014 is off by only about 0.004. 


(4) LaunchPad Online Resources 


* The StatClips Examples video, Central Limit Theorem Part | Example A, 
and the Snapshots video, Sampling Distributions and CLT, discuss examples 
of the central limit theorem in action. 


* The StatBoards video, x Is Approximately Normal for Large n for Any 
Distribution, discusses examples of computing the sampling distribution of 
X for large sample sizes. 


Apply Your Knowledge 


© Bruce Coleman Inc./Alamy 


15.11 What Does the Central Limit Theorem Say? Asked what the central 
limit theorem says, a student replies, “As you take larger and larger samples 
from a population, the histogram of the sample values looks more and more 
Normal.” Is the student right? Explain your answer. 


15.12 Detecting Gypsy Moths. The gypsy moth is a serious threat to oak and 
aspen trees. A state agriculture department places traps throughout the 
state to detect the moths. When traps are checked periodically, the mean 
number of moths trapped is only 0.5, but some traps have several moths. 
The distribution of moth counts is finite and strongly skewed, with standard 
deviation 0.7. 


(a) What are the mean and standard deviation of the average number of 
moths x in 50 traps? 


(b) Use the central limit theorem to find the probability that the average 
number of moths in 50 traps is greater than 0.6. 
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population of millions of apartment owners, the mean annual loss from dam- 
age is w = $125 and the standard deviation of the loss is a = $300. The 
distribution of losses is strongly right-skewed: most policies have $0 loss, but 
a few have large losses. If the company sells 10,000 policies, can it safely base 
its rates on the assumption that its average loss will be no greater than $135? 
Follow the four-step process as illustrated in Example 15.8. 


15.13 More on Insurance. An insurance company knows that, in the entire 42 


We have looked carefully at the sampling distribution of a sample mean. 
However, any statistic we can calculate from a sample will have a sampling 
distribution. 


In Example 15.5, we took 1000 SRSs of size 10 from a Normal population with 
mean 2 = 25 micrograms per liter and standard deviation o = 7 micrograms per 
liter. This Normal distribution is the distribution of the DMS odor thresholds of all 
adults. Figure 15.3 is a histogram of the distribution of the sample means. 

Now take 1000 SRSs of size 5 from a Normal population with mean pw = 25 
and standard deviation o = 7. For each sample, compute the sample median, vari- 
ance, and standard deviation. Figure 15.7 displays histograms of the 1000 sample 
results. These histograms show the sampling distributions of the three statistics. 
The sampling distribution of the sample median is symmetric, centered at 25, and 
approximately Normal. The sampling distribution of the sample variance is strongly 
skewed to the right. The sampling distribution of the sample standard deviation is 
very slightly skewed to the right. m 


The distribution of the 
1000 sample medians 
for samples of size 5. 


FIGURE 15.7 

(a) The distribution of 1000 sample 

medians for samples of size 5 from a 

Normal population with w = 25 and 

o = 7. (b) The distribution of 1000 

sample variances for samples of size 5 

from a Normal population with w = 25 

and o@ = 7. (c) The distribution of 

45 1000 sample standard deviations for 
samples of size 5 from a Normal popu- 
lation with w = 25 anda = 7. 


362, CHAPTER 15 @ Sampling Distributions 


FIGURE 15.7 
(Continued) 


The distribution of the 
1000 sample variances 
for samples of size 5. 


T T 
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The distribution of the 
1000 sample standard 
deviation for samples 
of size 5. 
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The sampling distribution of a sample statistic is determined by the particular 
sample statistic we are interested in, the distribution of the population of individual 
values from which the sample statistic is computed, and the method by which 
samples are selected from the population. The sampling distribution allows us to 
determine the probability of observing any particular value of the sample statistic in 
another such sample from the population. Let’s look at one application of this idea 
to a concept we first introduced in Chapter 9. 

In Chapter 9 (page 238) we stated that we can use the laws of probability to 
learn if an observed treatment effect is larger than we would expect if only chance 
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were operating. We said that an observed effect that is so large that it would 
rarely occur by chance is called statistically significant. How does one determine 
whether an observed effect would rarely occur if only chance were operating? By 
“only chance operating” we mean “under the assumption that there is no treatment 
effect.” If there is no treatment effect, the response of any subject will be the same 
regardless of whether the subject received the treatment. If the observed treatment 
effect is represented by the value of a sample statistic—for example, the mean of 
the responses of those in a treatment group minus the mean of those in a control 
group—this can be done by considering the sampling distribution of the sample 
statistic under the assumption that no effect is actually present. Use this sampling 
distribution to determine the probability that we would observe values as extreme 
as we actually observed. The next two examples illustrate this calculation. 


Do adults aged 65 and older have palates similar to other adults? Are they just as 
capable as other adults of appreciating a fine wine? Suppose we take an SRS of five 
adults from the population of all adults aged 65 and older. We present them with 
both natural wine and the same wine spiked with DMS at different concentrations to 
find the lowest concentration (odor threshold) at which they identify the spiked wine. 
We compute the median of the odor thresholds for the five subjects and find the 
median to be 35. If the palates of those aged 65 and older are the same as those 
of other adults (in other words, there is no “treatment effect” where the treatment is 
being aged 65 and older), we would expect the sample median to be similar to those 
computed from SRSs of size 5 from the population of all adults. We constructed this 
sampling distribution in Example 15.9. Looking at Figure 15.7(a), we see that 35 is 
an unusually large value for the median, if the palates of adults aged 65 and older 
are the same as other adults. Thus, the observed median of 35 might be regarded 
as evidence that the palates of adults aged 65 and over are different from those of 
adults in the general population. m 


We can use the same type of reasoning to determine statistical significance for 
statistics computed from completely randomized designs. 


Example 12.8 (page 287) discusses a weight-loss experiment with six subjects, Ari, 
Luis, Troy, Ana, Deb, and Hui. Three are assigned completely at random to a new 
experimental weight-loss treatment and three to a placebo. If the new treatment has 
no effect, then the weight lost by any subject will be the same regardless of whether 
the subject received the new treatment or a placebo. Here are the observed weight 
losses (in pounds) for each subject: 


Subject Ari Luis Troy Ana Deb Hui 
Weight Loss 2 15 8 1 12 9 


We emphasize that we would have observed these same weight losses regardless of 
who was assigned the treatment and who the placebo. 

The observed treatment effect is the mean weight lost by those receiving the new 
treatment minus the mean weight lost by those receiving the placebo. If there is no 
difference between the treatment and the placebo, the observed treatment effect is the 


364 


CHAPTER 15 @ Sampling Distributions 


result of the “luck of the draw,” namely which three subjects just happened, by chance, 
to be assigned to the treatment group. Here are all 20 possible ways of selecting three 
of these subjects for the treatment group (the remaining three are in the placebo group). 


Treatment Group 


Treatment Group 


Ari, Luis, Troy 


Luis, Troy, Ana 


Ari, Luis, Ana 


Luis, Troy, Deb 


Ari, Luis, Deb 


Luis, Troy, Hui 


Ari, Luis, Hui 


Luis, Ana, Deb 


Ari, Troy, Ana 


Luis, Ana, Hui 


Ari, Troy, Deb 


Luis, Deb, Hui 


Ari, Troy, Hui 


Troy, Ana, Deb 


Ari, Ana, Deb Troy, Ana, Hui 
Ari, Ana, Hui Troy, Deb, Hui 
Ari, Deb, Hui Ana, Deb, Hui 


With a completely randomized design, each of these 20 possible treatment groups 
is equally likely, thus each has probability 1/20 of being the actual group assigned 
to the treatment. We can use this information to determine the sampling distri- 
bution of the possible observed treatment effects. For example, if Ari, Luis, and 
Troy are assigned to the treatment group, the mean weight loss for the group is 
e+ 1° +8 — 833, The mean weight loss for the placebo group (Ana, Deb, and Hui) 
is then +++2+* = 7.33. The difference in mean weight losses (treatment group 
weight loss minus control group weight loss) is 8.33 — 7.33 = 1.00. 

We can repeat this calculation for each possible assignment of subjects to experi- 
mental groups. For the 20 possible ways we can assign subjects to the treatment and 
placebo groups, the differences in the mean weight losses for the two groups are: 


Difference in 


Treatment Group Control Group Mean Weight Loss 


Ari, Luis, Troy Ana, Deb, Hui 1.00 
Ari, Luis, Ana Troy, Deb, Hui = 3,07 
Ari, Luis, Deb Troy, Ana, Hui 3.67 
Ari, Luis, Hui Troy, Ana, Deb 1.67 
Ari, Troy, Ana Luis, Deb, Hui —8.33 
Ari, Troy, Deb Luis, Ana, Hui = ILC 
Ari, Troy, Hui Luis, Ana, Deb —3.00 
Ari, Ana, Deb Luis, Troy, Hui 6,33 
Ari, Ana, Hui Luis, Troy, Deb —7.67 
Ari, Deb, Hui Luis, Troy, Ana =0,33 
Luis, Troy, Ana Ari, Deb, Hui 0.33 
Luis, Troy, Deb Ari, Ana, Hui 1ST 
Luis, Troy, Hui Ari, Ana, Deb 6.33 
Luis, Ana, Deb Ari, Troy, Hui 3.00 
Luis, Ana, Hui Ari, Troy, Deb 1.00 
Luis, Deb, Hui Ari, Troy, Ana 8.33 
Troy, Ana, Deb Ari, Luis, Hui —1.67 
Troy, Ana, Hui Ari, Luis, Deb —3.67 
Troy, Deb, Hui Ari, Luis, Ana 3.67 
Ana, Deb, Hui Ari, Luis, Troy aa OO 
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Ordering the differences in mean weight losses from low to high, recalling each 
assignment of subjects to the treatment groups has probability 1/20 = 0.05, and 
combining duplicates, we obtain: 


Wisin boss | 8.333 =1467 =O33 =F) =300 =i? =100 =033 
Probability 0.05 0.05 0.05 0.10 0.05 0.05 0.10 0.05 
Weight Loss | 0.33 1.00 167 3.00 3.67 6.33 Ol 8.33) 
Probability 0.05 0.10 0.05 0.10 0.10 0.05 0.10 0.05 


This is the sampling distribution of the differences in mean weight losses for the two 
groups under the assumption that weight lost by a subject does not depend on the 
group to which the subject was assigned. From this sampling distribution we can 
determine whether the actual treatment effect observed is statistically significant. m 


What would we conclude if we conducted this experiment and Troy, Deb, and 
Hui were assigned to the group receiving the new treatment? In this case, the 
observed treatment effect (the difference in mean weight lost for the two groups) 
would be 3.67. The mean weight lost for the treatment group is greater than for 
the placebo group. So this might be regarded as evidence that the new treatment 
is effective. However, if there is no difference in the effects of the new treatment 
and a placebo, Example 15.10 suggests that the chance of observing a difference as 
large as or larger than 3.67 is 0.3. Would you regard an effect that has probability 
0.3 of occurring by chance rare? If not, then you would not consider the results to 
be statistically significant. 


(4) LaunchPad Online Resources 


* The StatBoards video, Simulating a Sampling Distribution, discusses examples 
of simulating sampling distributions. 


Apply Your Knowledge 


15.14 Statistical Significance of a Variance. In Example 15.9, we constructed 
the sampling distribution (see Figure 15.7 (b)) of the sample variance of an 
SRS of size 5 from the distribution of DMS odor thresholds of individual 
adults. We assume that this population distribution is Normal with mean 
p = 25 micrograms per liter and standard deviation 0 = 7 micrograms per 
liter. In Example 15.10, we selected an SRS of size 5 from the members of a 
national wine-tasting organization. Suppose the variance of this SRS is 9.2. 
Would you consider this value to be statistically significant, if the palates of 
the members of the wine-tasting organization are no different than those of 
all adults? Explain your answer. 


15.15 Statistical Significance from a Sampling Distribution. In Exercise 15.7, 
you generated 10 samples of size 4 from the population of 10 students, calcu- 
lated x for each sample, and constructed a histogram of these 10 values of x. 


(a) Use the Simple Random Sample applet to generate 25 new samples of 
size 4, calculate x for each, and construct a histogram of the 25 values. 
Once again, you are constructing the sampling distribution of x. 

(b) Based on your histogram in part (a), what would you estimate to be 


the chance of obtaining a simple random sample of four students with 
x= 76? 
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(c) Suppose you learn that students 1, 3, 5, and 7 are honors students. 
Would you regard their mean score to be “statistically significant”? 


15.16 A Completely Randomized Design and Statistical Significance. Eight 
students volunteer for an experiment to compare two methods (A and B) 
for preparing for the Mathematics SAT exam. Four students are assigned 
completely at random to Method A, and the other four to Method B. All 
eight take the Mathematics SAT, study using the assigned method for three 
months, and retake the exam. The change in score (score after three months 
using the assigned method minus initial score) for all eight students is: 


Student 1 2 3 4 5) 6 7 8 
Change in Score 0 0 0 0 24 24 24 24 


Suppose that there is no difference in the effects of the two methods, so that 
the change in scores for the eight students would be the same regardless of 
which method they were assigned. 


(a) Use the Simple Random Sample applet to assign students to the two 
methods. Compute the difference in mean scores for those you assigned 
to Method A minus those you assigned to Method B. 


(b) Repeat part (a) 24 more times. Make a histogram of the 25 differences 
in mean scores. 


(c) Based on your histogram in part (b), what would you estimate to be the 
probability of observing a difference in mean scores of 12 or greater? 


(d) Suppose you didn’t know whether the two methods had the same effect 
on Mathematics SAT scores. Suppose you conducted the experiment 
and students 4, 5, 6, and 7 were assigned to Method A. The difference 
in the mean scores of students assigned to Method A minus the mean 
scores of those assigned to Method B is 12. Should this result be 
regarded as evidence that Method A is better than Method B.? Explain. 


CHAPTER 15 SUMMARY 


Chapter Specifics 


A parameter in a statistical problem is a number that describes a population, such as 
the population mean yp. To estimate an unknown parameter, use a statistic calculated 
from a sample, such as the sample mean x. 


The law of large numbers states that the actual observed mean outcome xX must 
approach the mean yp of the population as the number of observations increases. 


The population distribution of a variable describes the values of the variable for all 
individuals in a population. 


The sampling distribution of a statistic describes the values of the statistic in all pos- 
sible samples of the same size from the same population. 


When the sample is an SRS from the population, the mean of the sampling distribution 
of the sample mean x is the same as the population mean p. That is, x is an unbiased 
estimator of ju. 


The standard deviation of the sampling distribution of x is a/ ‘Vn for an SRS of size 
n if the population has standard deviation a. That is, averages are less variable than 
individual observations. 


mu When the sample is an SRS from a population that has a Normal distribution, the 
sample mean x also has a Normal distribution. 


m= Choose an SRS of size n from any population with mean p and finite standard devia- 
tion o. The central limit theorem states that when n is large, the sampling distribu- 
tion of x is approximately Normal. That is, averages are more Normal than individual 
observations. We can use the N(w, o/ Vn) distribution to calculate approximate prob- 
abilities for events involving x. 


Link It 


As we mentioned in Chapter 12, probability is the tool we will use to generalize from data 
produced by random samples and randomized comparative experiments to some wider 
population. In this chapter we begin to formalize this process. We use a statistic to estimate 
an unknown parameter. We use the sampling distribution to summarize the behavior of a 
statistic in all possible random samples of the same size from a population. 

More specifically, in this chapter we begin to think about how a sample mean, x, can 
provide information about a population mean, ». When the sample mean is computed 
from an SRS drawn from a large population, its sampling distribution has properties that 
help us understand how the sample mean can be used to draw conclusions about a popula- 
tion mean. The law of large numbers tells us that a sample mean computed from a random 
sample from some population gets closer and closer to the population mean as the sample 
size increases. The central limit theorem describes the sampling distribution of the sample 
mean for “large” SRSs and allows us to make probability statements about possible values 
of the sample mean. Beginning with Chapter 16, we will develop specific methods for draw- 
ing conclusions about a population mean based on a sample mean computed from an SRS. 
These methods will use the tools developed in this chapter. 


(9) LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


° StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 


Summary 
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15.17 


15.18 


15.19 


15.20 


15.21 


The Bureau of Labor Statistics announces that last 
month it interviewed all members of the labor force 
in asample of 60,000 households; 6.7% of the people 
interviewed were unemployed. The boldface number 
isa 

(a) sampling distribution. 

(b) statistic. 

(c) parameter. 


An April 2011 poll of Canadian adults found that 
57% were certain that they would vote in the May 
2011 elections. Election records show that 61.1% of 
registered voters voted in the election. The boldface 
number is a 

(a) sampling distribution. 

(b) statistic. 

(c) parameter. 


Annual returns on common small stocks available to 
investors vary a lot. In a recent year, the mean return 
was 11.7% and the standard deviation of returns was 
34.1%. The law of large numbers says that 


(a) you can get an average return higher than the 
mean 11.7% by investing in a large number of stocks. 
(b) as you invest in more and more stocks chosen 
at random, your average return on these stocks gets 
ever closer to 11.7%. 

(c) if you invest in a large number of stocks chosen 
at random, your average return will have approxi- 
mately a Normal distribution. 


Scores on the Critical Reading part of the SAT exam 
in a recent year were roughly Normal with mean 496 
and standard deviation 115. You choose an SRS of 
100 students and average their SAT Critical Read- 
ing scores. If you do this many times, the mean of the 
average scores you get will be close to 


(a) 496. 
(b) 496/100 = 4.96. 
(c) 496/V 100 = 49.6. 


Scores on the Critical Reading part of the SAT 
exam in a recent year were roughly Normal with 
mean 496 and standard deviation 115. You choose 
an SRS of 100 students and average their SAT 


15.22 


15.23 


15.24 


Critical Reading scores. If you do this many times, 
the standard deviation of the average scores you get 
will be close to 


(G2) ILS), 
(b) 115/100 = 1.15. 
(c) 115/100 = 11.5. 


A newborn baby has extremely low birth weight 
(ELBW) if it weighs less than 1000 grams. A study of 
the health of such children in later years examined a 
random sample of 219 children. Their mean weight 
at birth was x = 810 grams. This sample mean is an 
unbiased estimator of the mean weight py in the popu- 
lation of all ELBW babies. This means that 


(a) in many samples from this population, the mean 
of the many values of x will be equal to pw. 

(b) as we take larger and larger samples from this 
population, x will get closer and closer to p. 

(c) in many samples from this population, the many 
values of x will have a distribution that is close to 
Normal. 


The number of hours a battery lasts before failing 
varies from battery to battery. The distribution of 
failure times follows an exponential distribution (see 
Example 15.8), which is strongly skewed to the right. 
The central limit theorem says that 


(a) as we look at more and more batteries, their 
average failure time gets ever closer to the mean pL 
for all batteries of this type. 

(b) the average failure time of a large number 

of batteries has a distribution of the same shape 
(strongly skewed) as the distribution for individual 
batteries. 

(c) the average failure time of a large number of 
batteries has a distribution that is close to Normal. 


The length of human pregnancies from concep- 
tion to birth varies according to a distribution that 
is approximately Normal with mean 266 days and 
standard deviation 16 days. The probability that the 
average pregnancy length for six randomly chosen 
women exceeds 270 days is about 


(a) 0.40. (b) 0.27. (@) On. 


| CHAPTER 15 EXERCISES 


15.25 


Testing glass. How well materials conduct heat mat- 
ters when designing houses. As a test of a new mea- 
surement process, 10 measurements are made on 
pieces of glass known to have conductivity 1. The 
average of the 10 measurements is 1.07. For each 
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of the boldface numbers, indicate whether it is a 
parameter or a statistic. Explain your answer. 


Statistics anxiety. What can teachers do to allevi- 
ate statistics anxiety in their students? To explore 
this question, statistics anxiety for students in two 


15.27 


15.28 


15.29 


classes was compared. In one class, the instructor 
lectured in a formal manner, including dressing 
formally. In the other, the instructor was less for- 
mal, dressed informally, was more personal, used 
humor, and called on students by their first names. 
Anxiety was measured using a questionnaire. 
Higher scores indicate a greater level of anxiety. 
The mean anxiety score for students in the formal 
lecture class was 25.40; in the informal class, the 
mean was 20.41. For each of the boldface numbers, 
indicate whether it is a parameter or a statistic. 
Explain your answer. 


Roulette. A roulette wheel has 38 slots, of which 
18 are black, 18 are red, and 2 are green. When the 
wheel is spun, the ball is equally likely to come to 
rest in any of the slots. One of the simplest wagers 
chooses red or black. A bet of $1 on red returns $2 
if the ball lands in a red slot. Otherwise, the player 
loses his dollar. When gamblers bet on red or black, 
the two green slots belong to the house. Because 
the probability of winning $2 is 18/38, the mean 
payoff from a $1 bet is twice 18/38, or 94.7 cents. 
Explain what the law of large numbers tells us about 
what will happen if a gambler makes very many bets 
on red. 


The Medical College Admission Test. Almost all 
medical schools in the United States require stu- 
dents to take the Medical College Admission Test 
(MCAT). To estimate the mean score pw of those 
who took the MCAT on your campus, you will 
obtain the scores of an SRS of students. The scores 
follow a Normal distribution, and from published 
information you know that the standard deviation 
is 6.5. Suppose that (unknown to you) the mean 
score of those taking the MCAT on your campus 
is 25.0. 


(a) If you choose one student at random, what is 
the probability that the student’s score is between 
20 and 30? 

(b) You sample 25 students. What is the sampling 
distribution of their average score x? 

(c) What is the probability that the mean score of 
your sample is between 20 and 30? 


Glucose testing. Shelia’s doctor is concerned that 
she may suffer from gestational diabetes (high 
blood glucose levels during pregnancy). There is 
variation both in the actual glucose level and in 
the blood test that measures the level. In a test to 
screen for gestational diabetes, a patient is clas- 
sified as needing further testing for gestational 
diabetes if the glucose level is above 130 milli- 
grams per deciliter (mg/dL) one hour after hav- 
ing a sugary drink. Shelia’s measured glucose level 
one hour after the sugary drink varies according to 


15.30 


15.31 


15.32 


15.33 
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the Normal distribution with w = 122 mg/dL and 
o = 12 mg/dL. 


(a) Ifa single glucose measurement is made, what is 
the probability that Shelia is diagnosed as needing 
further testing for gestational diabetes? 

(b) If measurements are made on four separate days 
and the mean result is compared with the criterion 
130 mg/dL, what is the probability that Shelia is 
diagnosed as needing further testing for gestational 
diabetes? 


Daily activity. It appears that people who are 
mildly obese are less active than leaner people. One 
study looked at the average number of minutes per 
day that people spend standing or walking. Among 
mildly obese people, the mean number of minutes 
of daily activity (standing or walking) is approxi- 
mately Normally distributed with mean 373 min- 
utes and standard deviation 67 minutes. The mean 
number of minutes of daily activity for lean people 
is approximately Normally distributed with mean 
526 minutes and standard deviation 107 minutes. 
A researcher records the minutes of activity for an 
SRS of five mildly obese people and an SRS of five 
lean people. 


(a) What is the probability that the mean number 
of minutes of daily activity of the five mildly obese 
people exceeds 420 minutes? 

(b) What is the probability that the mean number 
of minutes of daily activity of the five lean people 
exceeds 420 minutes? 


Glucose testing, continued. Shelia’s measured 
glucose level one hour after having a sugary drink 
varies according to the Normal distribution with 
mw = 122 mg/dL and o = 12 mg/dL. What is the 
level L such that there is probability only 0.05 that 
the mean glucose level of four test results falls above 
L? (Hint: This requires a backward Normal cal- 
culation. See page 91 in Chapter 3 if you need to 
review.) 


Pollutants in auto exhausts. Light vehicles sold in 
the United States must emit an average of no more 
than 0.05 grams per mile (g/mi) of nitrogen oxides 
(NOX) after 50,000 or fewer miles of driving. NOX 
emissions after 50,000 or fewer miles of driving for 
one car model vary Normally with mean 0.03 g/mi 
and standard deviation 0.01 g/mi. 

(a) What is the probability that a single car of this 
model emits more than 0.05 g/mi of NOX? 

(b) A company has 25 cars of this model in its fleet. 
What is the probability that the average NOX level 
x of these cars is above 0.05 g/mi? 


Runners. Ina study of exercise, a large group of male 
runners walk on a treadmill for six minutes. After 
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this exercise, their heart rates vary with mean 8.8 
beats per five seconds and standard deviation 1.0 
beats per five seconds. This distribution takes only 
whole-number values, so it is certainly not Normal. 


Bruce Laurance/Getty Images 


(a) Let x be the mean number of beats per five 
seconds after measuring heart rate for 24 five-second 
intervals (two minutes). What is the approximate dis- 
tribution of x according to the central limit theorem? 
(b) What is the approximate probability that x is 
less than 8? 

(c) What is the approximate probability that the 
heart rate of a runner is less than 100 beats per min- 
ute? (Hint: Restate this event in terms of x.) 
Pollutants in auto exhausts, continued. The level 
of nitrogen oxides (NOX) in the exhaust after 50,000 
miles or fewer of driving of cars of a particular model 
varies Normally with mean 0.03 g/mi and standard 
deviation 0.01 g/mi. A company has 25 cars of this 
model in its fleet. What is the level L such that the 
probability that the average NOX level x for the fleet 
is greater than L is only 0.01? (Hint: This requires a 
backward Normal calculation. See page 91 in Chap- 
ter 3 if you need to review.) 


Returns on stocks. Andrew plans to retire in 40 
years. He plans to invest part of his retirement funds 
in stocks, so he seeks out information on past returns. 
He learns that from 1964 to 2013, the annual returns 
on U.S. common stocks had mean 13.3% and stan- 
dard deviation 17.0%.° The distribution of annual 
returns on common stocks is roughly symmetric, 
so the mean return over even a moderate number 
of years is close to Normal. What is the probability 
(assuming that the past pattern of variation contin- 
ues) that the mean annual return on common stocks 
over the next 40 years will exceed 10%? What is the 
probability that the mean return will be less than 
5%? Follow the four-step process as illustrated in 
Example 15.8. 
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Airline passengers get heavier. In response to the 
increasing weight of airline passengers, the Federal 
Aviation Administration (FAA) in 2003 told airlines 
to assume that passengers average 190 pounds in the 
summer, including clothing and carry-on baggage. 
But passengers vary, and the FAA did not specify a 
standard deviation. A reasonable standard deviation 
is 35 pounds. Weights are not Normally distributed, 
especially when the population includes both men 
and women, but they are not very non-Normal. A 
commuter plane carries 22 passengers. What is the 
approximate probability that the total weight of the 
passengers exceeds 4500 pounds? Use the four-step 
process to guide your work. (Hint: To apply the cen- 
tral limit theorem, restate the problem in terms of 
the mean weight.) 


© Jeff Greenberg/The Image Works 


Sampling students. To estimate the mean score pw 
of those who took the Medical College Admission 
Test on your campus, you will obtain the scores of 
an SRS of students. From published information, you 
know that the scores are approximately Normal with 
standard deviation about 6.5. How large an SRS 
must you take to reduce the standard deviation of 
the sample mean score to 1? 


Sampling students, continued. To estimate the 
mean score pw of those who took the Medical College 
Admission Test on your campus, you will obtain the 
scores of an SRS of students. From published infor- 
mation, you know that the scores are approximately 
Normal with standard deviation about 6.5. You want 
your sample mean x to estimate jz with an error of no 
more than | point in either direction. 


(a) What standard deviation must x have so that 
99.7% of all samples give an X within 1 point of w? 
(Use the 68—95-99.7 rule.) 

(b) How large an SRS do you need in order to 
reduce the standard deviation of x to the value you 
found in part (a)? 

Playing the numbers. The numbers racket is a well- 
entrenched illegal gambling operation in most large 
cities. One version works as follows: you choose one 
of the 1000 three-digit numbers 000 to 999 and pay 
your local numbers runner a dollar to enter your 
bet. Each day, one three-digit number is chosen at 
random and pays off $600. The mean payoff for the 
population of thousands of bets is 4 = 60 cents. Joe 
makes one bet every day for many years. Explain 
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what the law of large numbers says about Joe’s results 
as he keeps on betting. 


Playing the numbers: a gambler gets chance out- 
comes. The law of large numbers tells us what hap- 
pens in the long run. Like many games of chance, the 
numbers racket has outcomes so variable—one three- 
digit number wins $600 and all others win nothing— 
that gamblers never reach “the long run.” Even after 
many bets, their average winnings may not be close 
to the mean. For the numbers racket, the mean payout 
for single bets is $0.60 (60 cents) and the standard 
deviation of payouts is about $18.96. If Joe plays 
350 days a year for 40 years, he makes 14,000 bets. 


(a) What are the mean and standard deviation 

of the average payout x that Joe receives from his 
14,000 bets? 

(b) The central limit theorem says that his average 
payout is approximately Normal with the mean and 
standard deviation you found in part (a). What 

is the approximate probability that Joe’s average 
payout per bet is between $0.50 and $0.70? You 

see that Joe’s average may not be very close to the 
mean $0.60 even after 14,000 bets. 


Playing the numbers: the house has a business. 
Unlike Joe (see Exercise 15.40) the operators of the 
numbers racket can rely on the law of large numbers. 
It is said that the New York City mobster Casper 
Holstein took as many as 25,000 bets per day in the 
Prohibition era. That’s 150,000 bets in a week if he 
takes Sunday off. Casper’s mean winnings per bet 
are $0.40 (he pays out 60 cents of each dollar bet 
to people like Joe and keeps the other 40 cents.) His 
standard deviation for single bets is about $18.96, 
the same as Joe’s. 


NY Daily News Archive via Getty Images 


(a) What are the mean and standard deviation of 
Casper’s average winnings x on his 150,000 bets? 
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(b) According to the central limit theorem, what is 
the approximate probability that Casper’s average 
winnings per bet are between $0.30 and $0.50? After 
only a week, Casper can be pretty confident that his 
winnings will be quite close to $0.40 per bet. 


Can we trust the central limit theorem? The cen- 
tral limit theorem says that “when n is large” we can 
act as if the distribution of a sample mean x is close 
to Normal. How large a sample we need depends 
on how far the population distribution is from being 
Normal. Example 15.8 shows that we can trust this 
Normal approximation for quite moderate sample 
sizes even when the population has a strongly skewed 
continuous distribution. 

The central limit theorem requires much larger 
samples for Joe’s bets with his local numbers racket. 
The population of individual bets has a finite dis- 
tribution with only two possible outcomes: $600 
(probability 0.001) and $0 (probability 0.999). This 
distribution has mean ws = 0.6 and standard devia- 
tion about o = 18.96. With more math and good 
software, we can find exact probabilities for Joe’s 
average winnings. 


(a) If Joe makes 14,000 bets, the exact probability 
P(0.5 =x = 0.7) = 0.4961. How accurate was 
your Normal approximation from part (b) of 
Exercise 15.40? 

(b) If Joe makes only 3500 bets, 

P(0.5 =x = 0.7) =0.4048. How accurate is the 
Normal approximation for this probability? 

(c) If Joe and his buddies make 150,000 bets, 
P(0.5 =x = 0.7) = 0.9629. How accurate is the 


Normal approximation? 


What’s the mean? Suppose that you roll three bal- 
anced dice. We wonder what the mean number of 
spots on the up-faces of the three dice is. The law 
of large numbers says that we can find out by expe- 
rience: roll three dice many times, and the average 
number of spots will eventually approach the true 
mean. Set up the Law of Large Numbers applet to 
roll three dice. Don’t click “Show mean” yet. Roll 
the dice until you are confident you know the mean 
quite closely, then click “Show mean” to verify your 
discovery. What is the mean? Make a rough sketch 
of the path the averages x followed as you kept add- 
ing more rolls. 


Statistical significance? Look again at Exercise 
12.50 (page 301). If your Canadian friend cor- 
rectly identifies all three cups with the milk added 
first, would you regard the result as statistically 
significant? 
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15.45 Online videos. There are several online videos of the law of large numbers and 
central limit theorem. Locate one such video, watch it, and write a brief summary 
of the video. We did a Google search of “videos for law of large numbers” and 
eee : : ” . 

videos for the central limit theorem” and found several links. 


15.46 Work the law of large numbers. Read the online article atezinearticles. 
com/?Work-The-Law-of-Large-Numbers-But-Remember-It- 
Only-Takes-One-to-Succeed! &i1d=932026. Does this article accurately 
describe the law of large numbers? Explain your answer. 


© Ariel Skelley/age fotostock 


Confidence Intervals: 
The Basics 


fter we have selected a sample, we know the responses of the individuals in 

the sample. The usual reason for taking a sample is not to learn about the 

individuals in the sample but to infer from the sample data some conclusion 
about the wider population that the sample represents. 


Statistical Inference 


Statistical inference provides methods for drawing conclusions about a population 
from sample data. 


Because a different sample might lead to different conclusions, we can’t be 
certain that our conclusions are correct. Statistical inference uses the language of 
probability to say how trustworthy our conclusions are. This chapter introduces one 
of the two most common types of inference, confidence intervals for estimating the 
value of a population parameter. The next chapter discusses the other common type 
of inference, tests of significance for assessing the evidence for a claim about a popu- 
lation parameter. Both types of inference are based on the sampling distributions of 
statistics. That is, both use probability to say what would happen if we applied the 
inference method many times. 


In this chapter 
we cover... 


16.1 


16.2 


16.3 


16.4 


The reasoning of statistical 


estimation 


Margin of error and confidence 


level 


Confidence intervals for a 


population mean 


How confidence intervals 


behave 
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int RANGES ARE 

ii) FOR STATISTICS? 
Many people like to think that 
statistical estimates are exact. The 
Nobel Prize-winning economist 
Daniel McFadden tells a story of his 
time on the Council of Economic 
Advisers. Presented with a range 
of forecasts for economic growth, 
President Lyndon Johnson replied: 
“Ranges are for cattle; give me one 
number.” 


This chapter presents the basic reasoning of statistical inference. To make the 
reasoning as clear as possible, we start with a setting that is too simple to be realistic. 
Here is the setting for our work in this chapter. 


Simple Conditions for Inference about a Mean 


1. We have an SRS from the population of interest. There is no nonresponse or 
other practical difficulty. The population is large compared to the size of the 
sample. 

2. The variable we measure has an exactly Normal distribution N(q, o) in the 
population. 

3. We don’t know the population mean w. But we do know the population stan- 
dard deviation o. 


The condition that the population is large relative to the size of the sample will 
be satisfied if the population is, say, at least 20 times as large. The conditions that we 
have a perfect SRS, that the population is exactly Normal, and that we know the 
A population o are all unrealistic. Chapter 18 begins to move from the “simple 
conditions” toward the reality of statistical practice. Later chapters deal with 
inference in fully realistic settings. 

If these “simple conditions” are unrealistic, why study them? One reason is that 
under these simple conditions we can apply what we have learned in previous chap- 
ters about the Normal distribution and the sampling distribution of a sample mean 
to develop, step-by-step, methods for inference about a mean. The reasoning used 
under simple conditions applies to more realistic settings where the mathematics is 
more complicated. 

Another reason for studying inference under these simple conditions is that we 
can carry out calculations using what we have already learned about the Normal 
distribution in previous chapters. This includes the calculations used to plan statis- 
tical studies, which we discuss in Chapter 18. Unfortunately, under more realistic 
conditions, such calculations are more complicated, and the connection to previous 
material about the Normal distribution is less clear. 

Although we never know whether a population is exactly Normal and we 
never know the population a, the methods we discuss in this and the next two 
chapters are approximately correct for sufficiently large sample sizes, provided 
we treat the sample standard deviation as though it were the population a. 
Thus, there are situations (admittedly rare) where these methods can be used 
in practice. 


Body mass index (BMI) is used to screen for possible weight problems. It is calcu- 
lated as weight divided by the square of height, measuring weight in kilograms and 
height in meters. Many online BMI calculators allow you to enter weight in pounds 
and height in inches. Adults with BMI less than 18.5 are considered underweight, 
and those with BMI greater than 25 may be overweight. For data about BMI, we 
turn to the National Health and Nutrition Examination Survey (NHANES), a 
continuing government sample survey that monitors the health of the American 
population. 
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An NHANES report gives data for 654 women aged 20 to 29 years.' The mean BMI 7 
of these 654 women was x = 26.8. On the basis of this sample, we want to esti- 
mate the mean BMI w in the population of all 20.6 million women in this age group. 
To match the “simple conditions,” we will treat the NHANES sample as an SRS 
from a Normal population with known standard deviation @ = 7.5. 
Here is the reasoning of statistical estimation in a nutshell: 


1. To estimate the unknown population mean BMI mw, use the mean x = 26.8 of 
the random sample. We don’t expect x to be exactly equal to 4, so we want to 
say how accurate this estimate is. 


Thomas Northcut/Getty Images 


2. We know the sampling distribution of x. In repeated samples, x has the Normal 
distribution with mean w and standard deviation o/ Vn. So the average BMI x 
of an SRS of 654 young women has standard deviation 


AD 
VE = Sea = 0.3 (rounded off) 


3. The 95 part of the 68—95-99.7 rule for Normal distributions says that x is within 
0.6 (that’s two standard deviations) of the mean yp in 95% of all samples. That 
is, for 95% of all samples of size 654, the distance between the sample mean 
X and the population mean yp is less than 0.6. So if we estimate that pw lies 
somewhere in the interval from x — 0.6 to x + 0.6, we'll be right for 95% of 
all possible samples. For this particular sample, this interval is 


X — 0.6 = 26.8 — 0.6 = 26.2 
to 
X + 0.6 = 26.8 + 0.6 = 27.4 


4. Because we got the interval 26.2 to 27.4 from a method that captures the 
population mean for 95% of all possible samples, we say that we are 95% con- 
fident that the mean BMI w of all young women is some value in that interval, 
no lower than 26.2 and no higher than 27.4. = 


The big idea is that the sampling distribution of x tells us how close to ws the sam- 
ple mean x is likely to be. Statistical estimation just turns that information around 
to say how close to x the unknown population mean yp is likely to be. We call the 
interval of numbers between the values xX + 0.6 a 95% confidence interval for jw. 


(9) LaunchPad Online Resources 


* The Snapshots video, Statistical Inference, provides a good general introduc- 
tion to statistical inference with several interesting examples. 


Apply Your Knowledge 


16.1 Number Skills of Eighth Graders. The National Assessment of Edu- 
cational Progress (NAEP) includes a mathematics test for eighth-grade 
students.” Scores on the test range from 0 to 500. Demonstrating the ability 
to use the mean to solve a problem is an example of the skills and knowledge 
associated with performance at the Basic level. An example of the knowledge 
and skills associated with the Proficient level is being able to read and inter- 
pret a stem-and-leaf plot. 
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In 2013, 170,100 eighth-graders were in the NAEP sample for the math- 
ematics test. The mean mathematics score was x = 285. We want to esti- 
mate the mean score yp in the population of all eighth-graders. Consider the 


NAEP sample as an SRS from a Normal population with standard deviation 
o = 125. 


(a) Ifwe take many samples, the sample mean x varies from sample to 
sample according to a Normal distribution with mean equal to the 
unknown mean score pw in the population. What is the standard devia- 
tion of this sampling distribution? 


(b) According to the 95 part of the 68—95-99.7 rule, 95% of all values 


of x fall within on either side of the unknown mean pr. What 
is the missing number? 


(c) What is the 95% confidence interval for the population mean score 
pu based on this one sample? 


16.2 Retaking the SAT. An SRS of 400 high school seniors gained an average 
of xX = 12 points in their second attempt at the SAT Mathematics exam. 
Assume that the change in score has a Normal distribution with standard 
deviation @ = 42. We want to estimate the mean change in score p in the 
population of all high school seniors. 


(a) Give a 95% confidence interval for wz based on this sample. 
(b) Based on your confidence interval in part (a), how certain are you that 


the mean change in score yw in the population of all high school seniors 
is greater than 0? (Hint: Does the interval in part (a) include 0?). 


The 95% confidence interval for the mean BMI of young women, based on the 
NHANES sample, is x + 0.6. Once we have the sample results in hand, we know 
that for this sample x = 26.8, so that our confidence interval is 26.8 + 0.6. Most 
confidence intervals have a form similar to this, 


estimate + margin of error 


The estimate (x = 26.8 in our example) is our guess for the value of the unknown 

margin of error __ parameter. The margin of error + 0.6 shows how accurate we believe our guess 
to be, based on the variability of the estimate. We have a 95% confidence interval 
because the interval x + 0.6 catches the unknown parameter in 95% of all possible 
samples. 


Confidence Interval 


A level C confidence interval for a parameter has two parts: 
» An interval calculated from the data, usually of the form 


estimate + margin of error 
» A confidence level C, which gives the probability that the interval will capture 


the true parameter value in repeated samples. That is, the confidence level is the 
success rate for the method. 
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Users can choose the confidence level, usually 90% or higher because we gen- 
erally want to be quite sure of our conclusions. The most common confidence 
level is 95%. 


Interpreting a Confidence Level 


The confidence level is the success rate of the method that produces the interval. We 
don’t know whether the 95% confidence interval from a particular sample is one of 
the 95% that capture w or one of the unlucky 5% that miss. 


To say that we are 95% confident that the unknown yp lies between 26.2 and 27.4 
is shorthand for “We got these numbers using a method that gives correct 
results 95% of the time.” 


Figures 16.1 and 16.2 illustrate the behavior of confidence intervals. Study these 
figures carefully. If you understand what they say, you have mastered one of the big 
ideas of statistics. 

Figure 16.1 illustrates the behavior of the interval x + 0.6 for the mean BMI of 
young women. Starting with the population, imagine taking many SRSs of 654 young 
women. The first sample has x = 26.8, the second has x = 27.0, the third has 
X = 26.2, and so on. The sample mean varies from sample to sample, but when we 
use the formula X + 0.6 to get an interval based on each sample, 95% of these inter- 
vals capture the unknown population mean jz. Notice we use the same margin of error, 
0.6, for each interval because the sample size and standard deviation o are the same. 

Figure 16.2 illustrates the idea of a 95% confidence interval in a different form. 
It shows the result of drawing many SRSs from the same population and calculating 
a 95% confidence interval from each sample. The center of each interval is at x and 
therefore varies from sample to sample. The sampling distribution of x appears at 
the top of the figure to show the long-term pattern of this variation. The population 
mean pw is at the center of the sampling distribution. The 95% confidence intervals 
from 25 SRSs appear underneath. The center x of each interval is marked by a dot. 
The arrows on either side of the dot span the confidence interval. All except one of 
these 25 intervals capture the true value of w. If we take a very large number of 
samples, 95% of the confidence intervals will contain w. = 


SRS n= 654 2 \ 
———___-» WO = 26.8 0.6 
SRSi=105) ; 
ean X+0.6=270+0.6 95% of these intervals 
SRS n= 654. ‘% capture the unknown mean 
> X +0.6=262+0.6 u of the population. 
e e 
e e 
ws Se S e 2 
POPULATION MANY SRSs MANY CONFIDENCE 
mean w unknown, INTERVALS 
std. dev.o = 75 
FIGURE 16.1 


To say that X + 0.6 is a 95% confidence interval for the population mean p is to say that, in repeated samples, 95% of these 
intervals capture. 
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This interval misses the true 
uu. The others all capture w. 


Sampling distribution of x 


<< Values of x ————> 


FIGURE 16.2 
Twenty-five samples from the same population gave these 95% confidence intervals. In the long run, 95% of all samples 
give an interval that contains the population mean wp. 


The Confidence Intervals applet creates figures similar to Figure 16.2. You can use 
the applet to watch confidence intervals from one sample after another capture or 
fail to capture the true parameter. 


(3) LaunchPad Online Resources 


* The Snapshots video, Confidence Intervals, discusses confidence intervals 
in the context of an example: the impact of wind turbines on raptor 
fatalities. 


Apply Your Knowledge 


16.3 Confidence Intervals in Action. The idea of an 80% confidence interval 
is that, in 80% of all samples, the method produces an interval that captures 
the true parameter value. That’s not high enough confidence for practical 
use, but 80% hits and 20% misses make it easy to see how a confidence 
interval behaves in repeated samples from the same population. Go to the 
Confidence Intervals applet. 


16.3 Confidence Intervals for a Population Mean 


(a) Set the confidence level to 80% and the sample size to 50. Click 
“Sample” to choose an SRS and calculate the confidence interval. Do 
this 10 times to simulate 10 SRSs with their 10 confidence intervals. 
How many of the 10 intervals captured the true mean 4? How many 
missed? 


(b) You see that we can’t predict whether the next sample will hit or miss. 
The confidence level, however, tells us what percent will hit in the 
long run. Reset the applet and click “Sample 25” to get the confidence 
intervals from 25 SRSs. How many hit? 


(c) Click “Sample 25” repeatedly and write down the number of hits each 
time. What was the percent of hits among 100, 200, 300, 400, 500, 
600, 700, 800, and 1000 SRSs? Even 1000 samples is not truly “the 
long run,” but we expect the percent of hits in 1000 samples to be 
fairly close to the confidence level, 80%. 


16.4 Losing Weight. A Gallup Poll in November 2013 found that 51% of the 
people in the sample said they want to lose weight. Gallup announced, “For 
results based on the total sample of national adults, one can say with 95% 
confidence that the maximum margin of sampling error is +4 percentage 
points.” 


(a) What is the 95% confidence interval for the percent of all adults who 
want to lose weight? 
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(b) What does it mean to have 95% confidence in this interval? 


In the setting of Example 16.1, we outlined the reasoning that leads to a 95% 
confidence interval for the unknown mean p of a population. Now we will reduce 
the reasoning to a formula. 

In Example 16.1, what role did 95% play in determining the confidence interval? 
To find a 95% confidence interval for the mean BMI of young women, we first 
caught the central 95% of the Normal sampling distribution by going out two stan- 
dard deviations in both directions from the mean. The value 95% determined how 
many standard deviations we go out in both directions from the mean to capture 
this central 95%. To find a level C confidence interval, we first catch the central 
area C under the Normal sampling distribution. How many standard deviations 
must we go out in both directions from the mean to capture this central area C? 
Because all Normal distributions are the same in the standard scale, we can obtain 
everything we need from the standard Normal curve. 

Figure 16.3 shows how the central area C under a standard Normal curve is 
marked off by two points z* and —z*. Numbers like z* that mark off specified 
areas are called critical values of the standard Normal distribution. Values of 
v* for many choices of C appear at the bottom of Table C in the back of the 
book, in the row labeled x*. Here are the entries for the most common confi- 
dence levels: 


Confidence Level C 90% 95% 99% 
Critical Value z* 1.645 1.960 2.576 


critical values 


380. CHAPTER 16 @® Confidence Intervals: The Basics 


FIGURE 16.3 

The critical value z* is the number that 
catches central probability C under a 
standard Normal curve between —z* 
and z*. 


Standard 
Normal curve 


ea 


Probability =C 


Probability = (1- C)/2 Probability = (1- C)/2 


You see that, for C = 95%, the table gives x* = 1.960. This is a bit more pre- 
cise than the approximate value y* = 2 based on the 68-95-99.7 rule. You can, 
of course, use software to find critical values x*, as well as the entire confidence 
interval. 

Figure 16.3 shows that there is area C under the standard Normal curve between 
—z* and x*. So any Normal curve has area C within z* standard deviations on 
either side of its mean. The Normal sampling distribution of x has area C within 
z*a/ Vn on either side of the population mean p because it has mean p and stan- 
dard deviation o/ Vn. If we start at X and go out z*a/ Vn in both directions, we get 
an interval that contains the population mean pw in a proportion C of all samples. 
This interval is 


= o - Co 
from x — z*—= to x + x*¥—= 


Vi Vi 


or 


It is a level C confidence interval for pu. 


Confidence Interval 


for the Mean of a Normal Population 


Draw an SRS of size n from a Normal population having unknown mean p and 
known standard deviation a. A level C confidence interval for ju is 


Re ae oe 


ss 


The critical value z* corresponding to the confidence level C is illustrated in Figure 
16.3 and found at the bottom of Table C. 


The steps in finding a confidence interval mirror the overall four-step process for 
organizing statistical problems. 


16.3 Confidence Intervals for a Population Mean 


Confidence Intervals: The Four-Step Process 


STATE: What is the practical question that requires estimating a parameter? 
PLAN: Identify the parameter, choose a level of confidence, and select the type of 
confidence interval that fits your situation. 


SOLVE: Carry out the work in two phases: 


1. Check the conditions for the interval you plan to use. 
2. Calculate the confidence interval. 


CONCLUDE: Return to the practical question to describe your results in this 
setting. 


(STATE?) Does the expectation of good weather lead to more generous behavior? 
Psychologists studied the size of the tip in a restaurant when a message indicating 
that the next day’s weather would be good was written on the bill. Here are tips from 
20 patrons, measured in percent of the total bill:? 


208 187 199 206 21.9 234 228 249 222 20.3 
24.9 22.3 270 204 22.2 240 21.1 22.1 220 22.7 


This is one of three sets of measurements made, the others being tips received when 
the message on the bill said that the next day’s weather would not be good or there 
was no message on the bill. We want to estimate the mean tip for comparison with 
tips under the other conditions. 


-PLAN: We will estimate the mean percentage tip w for all patrons of this restaurant 
when they receive a message on their bill indicating that the next day’s weather will 
be good by giving a 95% confidence interval. The confidence interval just introduced 
fits this situation. 

ESeRVEA We should start by checking the conditions for inference. For this example, 
we will first find the interval and then discuss how statistical practice deals with con- 
ditions that are never perfectly satisfied. 

The mean percentage tip of the sample is x = 22.21. As part of the “simple condi- 
tions,” suppose that, from past experience with patrons of this restaurant, we know 
that the standard deviation of percentage tip is o = 2. For 95% confidence, the 
critical value is z* = 1.960. A 95% confidence interval for px is therefore 
X + z*— = 22.21 + 1,960 —<— 
n V20 
= 22.21 + 0.88 
21.33 to 23.09 


ESONSRUDA We are 95% confident that the mean percentage tip for all patrons of 
this restaurant when their bill contains a message that the next day’s weather will be 
good is between 21.33 and 23.09. m 


In practice, the first part of the “Solve” step is to check the conditions for infer- 
ence. The “simple conditions” are as follows: 


1. SRS: We don’t have an actual SRS from the population of all patrons of this 
restaurant. Scientists often act as if subjects are SRSs if there is nothing special 
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FIGURE 16.4 
Stemplot of the percentage tips in 
Example 16.3. 


about how the subjects were obtained. But it is always better to have an actual 
SRS because otherwise we can never be sure that hidden biases aren’t present. 
This study was actually a randomized comparative experiment in which these 
20 patrons were assigned at random from a larger group of patrons to get one 
of the treatments being compared. 


Normal distribution: The psychologists expect from past experience that 
measurements like this on patrons of the same restaurant under the same 
conditions will follow approximately a Normal distribution. We can’t look at 
the population, but we can examine the sample. Figure 16.4 is a stemplot that 
is roughly bell-shaped, with perhaps a modest outlier but no strong skewness. 
Shapes like this often occur in small samples from Normal populations, so we 
have no reason to doubt that the population distribution is Normal. 


Known o: It really is unrealistic to suppose that we know that o = 2. We will 
see in Chapter 20 that it is easy to do away with the need to know o. 


As this discussion suggests, inference methods are often used when conditions 
like SRS and Normal population are not exactly satisfied. In this introductory 
chapter, we act as though the “simple conditions” are satisfied. In reality, wise use 
of inference requires judgment. Chapter 17 and the later chapters on each inference 
method will give you a better basis for judgment. 


() LaunchPad Online Resources 


* The StatBoards video, Creating and Interpreting Confidence Intervals, provides 


examples of computing and interpreting confidence intervals. 


Apply Your Knowledge 


16.5 Find a Critical Value. The critical value z* for confidence level 75% is 
not in Table C. Use software or Table A of standard Normal probabilities to 
find z*. Include in your answer a sketch like Figure 16.3 with C = 0.75 and 
your critical value y* marked on the axis. 


16.6 Measuring Conductivity. The National Institute of Standards and Technol- 
ogy (NIST) supplies “standard materials” whose physical properties are supposed 
to be known. For example, you can buy from NIST an iron rod whose electrical 
conductivity is supposed to be 10.1 at 293 kelvin. (The units for conductivity are 
microsiemens per centimeter. Distilled water has conductivity 0.5.) Of course, no 
measurement is exactly correct. NIST knows the variability of its measurements 
very well, so it is quite realistic to assume that the population of all measurements 
of the same rod has the Normal distribution with mean pz equal to the true con- 
ductivity and standard deviation @ = 0.1. Here are six measurements on the 
same standard iron rod, which is supposed to have conductivity 10.1: Ej.) conp 


10.08 9.89 10.05 1016 10.21 10.11 


NIST wants to give the buyer of this iron rod a 90% confidence interval for 
its true conductivity. What is this interval? Follow the four-step process as 
illustrated in Example 16.3. 


16.7 IQ Test Scores. Here are the IQ test scores of 31 seventh-grade girls in a 
Midwest school district:* MWSTIQ 


114 100 104 89 102. 91 114 114) «=103— 105 
106 130° 120° 132) Wh 228° JIS. 119° Sor 72 
111 = 103 74 112 107 103 98 96 112 112 93 
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(a) These 31 girls are an SRS of all seventh-grade girls in the school 
district. Suppose that the standard deviation of IQ scores in this 
population is known to be 0 = 15. We expect the distribution of IQ 
scores to be close to Normal. Make a stemplot of the distribution of these 
31 scores (split the stems) to verify that there are no major departures 
from Normality. You have now checked the “simple conditions” to the 
extent possible. 

(b) Estimate the mean IQ score for all seventh-grade girls in the school 
district, using a 99% confidence interval. Follow the four-step process 
as illustrated in Example 16.3. 


The x confidence interval X + z*a/ Vn for the mean of a Normal population 
illustrates several important properties that are shared by all confidence intervals in 
common use. The user chooses the confidence level, and the margin of error follows 
from this choice. We would like high confidence and also a small margin of error. 
High confidence says that our method almost always gives correct answers. A small 
margin of error says that we have pinned down the parameter quite precisely. The 
factors that influence the margin of error of the z confidence interval are typical of 
most confidence intervals. 


How do we get a small margin of error? The margin of error for the z 
confidence interval is 


‘ » 7 
margin of error = 7%" = 


Va 
This expression has z* and o in the numerator and Vn in the denominator. 
Therefore, the margin of error gets smaller when 


z* gets smaller. Smaller z* is the same as lower confidence level C (look again 
at Figure 16.3). There is a trade-off between the confidence level and the 

A margin of error. To obtain a smaller margin of error from the same data, 
you must be willing to accept lower confidence. 


a is smaller. The standard deviation o measures the variation in the population. 
You can think of the variation among individuals in the population as noise that 
obscures the average value yp. It is easier to pin down ps when @ is small. 


n gets larger. Increasing the sample size n reduces the margin of error for any 
confidence level. Larger samples thus allow more precise estimates. 

A However, because n appears under a square root sign, we must take four 
times as many observations to cut the margin of error in half. 


In practice, we can control the confidence level and sample size, but we can’t 
control a. 


In Example 16.3, psychologists recorded the size of the tip of 20 patrons in a res- 
taurant when a message indicating that the next day’s weather would be good was 
written on their bill. The data gave the mean size of the tip, as a percentage of the 
total bill, as X = 22.21, and we know that o = 2. The 95% confidence interval for 
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the mean percentage tip for all patrons of the restaurant when their bill contains a 
message that the next day’s weather will be good is 


2 
= 22.21 + 1.960 — 
V20 


xt ge. 
Vn 
= 22.21 + 0.88 
The 90% confidence interval based on the same data replaces the 95% critical value 
z* = 1.960 by the 90% critical value z* = 1.645. This interval is 
Co 2 
z*—— = 22.21 + 1.645—— 
Vn V20 


= 22.21 £0.74 


I+ 


x 


Lower confidence results in a smaller margin of error, 0.74 in place of +0.88. 
You can calculate that the margin of error for 99% confidence is larger, £1.15. 
Figure 16.5 compares these three confidence intervals. 


Sample mean = 22.21 Is the estimate 
of the unknown population mean. 


90% confidence 


FIGURE 16.5 
The lengths of three confidence 


intervals for Example 16.4. All 95% confidence 

three are centered at the estimate 

X = 22.21. When the data and the 99% conridence 

sample size remain the same, higher —>— ee eo or 
confidence results in a larger margin 20 21 2D 2B 24 
of error. Mean percentage tip 


If we had a sample of only 10 patrons, you can check that the margin of error 
for 95% confidence increases from +0.88 to £1.24. Cutting the sample size in 
half does not double the margin of error, because the sample size n appears under 
a Square root sign. 


Apply Your Knowledge 


16.8 Confidence Level and Margin of Error. Example 16.1 described 
NHANES survey data on the body mass index (BMI) of 654 young women. 
The mean BMI in the sample was x = 26.8. We treated these data as an 
SRS from a Normally distributed population with standard deviation 
Ge 1.5: 


(a) Give three confidence intervals for the mean BMI yp in this population, 
using 90%, 95%, and 99% confidence. 


(b) What are the margins of error for 90%, 95%, and 99% confidence? 
How does increasing the confidence level change the margin of error 
of a confidence interval when the sample size and population stan- 
dard deviation remain the same? 


16.9 Sample Size and Margin of Error. Example 16.1 described NHANES 
survey data on the body mass index (BMI) of 654 young women. The mean 
BMI in the sample was xX = 26.8. We treated these data as an SRS from a 
Normally distributed population with standard deviation @ = 7.5. 


(a) Suppose that we had an SRS of just 100 young women. What would 
be the margin of error for 95% confidence? 


(b) Find the margins of error for 95% confidence based on SRSs of 
400 young women and 1600 young women. 


(c) Compare the three margins of error. How does increasing the 
sample size change the margin of error of a confidence interval 
when the confidence level and population standard deviation 
remain the same? 


16.10 Retaking the SAT. In Exercise 16.2, we saw that an SRS of 400 high 
school seniors gained an average of x = 12 points in their second attempt 
at the SAT Mathematics exam. Assuming that the change in score has a 
Normal distribution with standard deviation o = 42, we computed a 95% 
confidence interval for the mean change in score pw in the population of all 
high school seniors. 


(a) Find a 90% confidence interval for w based on this sample. 


(b) What is the margin of error for 90%? How does decreasing the 
confidence level change the margin of error of a confidence 
interval when the sample size and population standard deviation 
remain the same? 

(c) Suppose we had an SRS of just 100 high school seniors. What would 
be the margin of error for 95% confidence? 

(d) How does decreasing the sample size change the margin of error of a 
confidence interval when the confidence level and population stan- 
dard deviation remain the same? 


(4) LaunchPad Online Resources 


* The StatBoards video, Confidence Level Changes the Confidence Interval, 
provides examples illustrating the behavior of confidence interval as the 
confidence level changes. 

* The StatBoards video, Sample Size Changes the Confidence Interval, 
provides examples illustrating the behavior of confidence interval as the 
sample size changes. 


CHAPTER 16 SUMMARY 


Chapter Specifics 


A confidence interval uses sample data to estimate an unknown population parameter 
with an indication of how accurate the estimate is and of how confident we are that 
the result is correct. 


Any confidence interval has two parts: an interval calculated from the data and a 
confidence level C. The confidence interval often has the form 


The confidence level is the success rate of the method that produces the interval. 
That is, C is the probability that the method will give a correct answer. If you use 95% 
confidence intervals often, in the long run 95% of your intervals will contain the true 
parameter value. You do not know whether or not a 95% confidence interval calcu- 
lated from a particular set of data contains the true parameter value. 


Summary 
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mw A level C confidence interval for the mean of a Normal population with known 
standard deviation a, based on an SRS of size n, is given by 


«7 


Vn 
m The critical value z* is chosen so that the standard Normal curve has area C between 
ok ok 
—z* and z*. 


xty 


m Other things being equal, the margin of error of a confidence interval gets smaller as 
m_ the confidence level C decreases, 
m_ the population standard deviation o decreases, 


m_ the sample size n increases. 


Link It 


The reason we collect data is not to learn about the individuals that we observed, but to 
infer from the data to some wider population that the individuals represent. Chapters 8 and 
9 tell us that the way we produce the data (sampling, experimental design) affects whether 
we have a good basis for generalizing to some wider population. Chapters 12, 13, 14, and 15 
discuss probability, the formal mathematical tool that determines the nature of the infer- 
ences we make. In particular, Chapter 15 discusses sampling distributions, which tell us how 
repeated SRSs behave and hence what a statistic (in particular, a sample mean) computed 
from our sample is likely to tell us about the corresponding parameter of the population (in 
particular, a population mean) from which the sample was selected. 

In this chapter, we discuss the basic reasoning of statistical estimation, with empha- 
sis on estimating a population mean. To an estimate of the population mean we attach a 
margin of error and a confidence level. The result is a confidence interval. The sampling 
distribution of the sample mean, discussed in Chapter 15, provides the mathematical basis 
for constructing confidence intervals and understanding their properties. Although we ap- 
ply the reasoning of statistical estimation in a simple and artificial setting (we assume we 
know the population standard deviation), we will use the same logic in future chapters to 
construct confidence intervals for population parameters in more realistic settings. 


() LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 
° StatTutor starts with a video review of each section and asks a series of ques- 
tions to check your understanding. You may find this helpful if you need addi- 
tional help understanding any of the material in this chapter. 
° LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 


CHECK YOUR SKILLS 


16.11 To give a 99.9% confidence interval for a population 


mean p, you would use the critical value 
(ike = 1960; 
(b) z* = 2.576. 
(One — 21k 


Use the following information for Exercises 16.12 through 16.14. 
A laboratory scale is known to have a standard deviation of 
o = 0.001 gram in repeated weighings. Scale readings in repeated 
weighings are Normally distributed, with mean equal to the true 
weight of the specimen. Three weighings of a specimen on this scale 
give 3.412, 3.416, and 3.414 grams. £ijy)) weicHts 


16.12 A 95% confidence interval for the true weight of this 
specimen is 


(a) 3.414 + 0.00113. 
(b) 3.414 + 0.00065. 
(c) 3.414 + 0.00196. 


16.13 You want a 99% confidence interval for the true 
weight of this specimen. The margin of error for this 


interval will be 


(a) smaller than the margin of error for 95% 
confidence. 

(b) greater than the margin of error for 95% 
confidence. 

(c) about the same as the margin of error for 95% 
confidence. 


16.14 Another specimen is weighed eight times on this 
scale. The average weight is 4.1602 grams. A 99% 


confidence interval for the true weight of this speci- 


men is 
(a) 4.1602 + 0.00032. 
(b) 4.1602 + 0.00069. 


(c) 4.1602 + 0.00091. 


Use the following information for Exercises 16.15 through 16.18. 
The National Assessment of Educational Progress (NAEP) in- 
cludes a mathematical test for eighth-grade students. Scores on the 
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16.19 Student study times. A class survey in a large class 
for first-year college students asked, “About how 
many hours do you study during a typical week?” 
The mean response of the 463 students was x = 15.3 
hours.’ Suppose that we know that the study time 
follows a Normal distribution with standard devia- 
tion a = 8.5 hours in the population of all first-year 
students at this university. 


Exercises 


3387 


test range from O to 500. Suppose that you give the NAEP test to 
an SRS of 900 eighth-graders from a large population in which the 
scores have mean sw = 285 and standard deviation o@ = 125. The 
mean x will vary if you take repeated samples. 


16.15 The sampling distribution of x is approximately 


Normal. It has mean w = 285. What is its standard 
deviation? 


(a) 125. 
(b) 4.167. 
(c) 0.139. 


Suppose that an SRS of 900 eighth-graders has 
x = 288. Based on this sample, a 95% confidence 
interval for p is 


(@) Sil” 2 O27, 
(b) 288 + 8.17. 
(©) As5) 22 hI. 


In Exercise 16.16, suppose that we computed a 99% 
confidence interval for p. 


16.16 


16.17 


(a) This 99% confidence interval would have a 
smaller margin of error than the 95% confidence 
interval. 

(b) This 99% confidence interval would have a 
larger margin of error than the 95% confidence 
interval. 


(c) This 99% confidence interval could have either 
a smaller or a larger margin of error than the 95% 
confidence interval. This varies from sample to 
sample. 


16.18 Suppose that we took an SRS of 1600 eighth-graders 
and found x = 288. Compared with an SRS of 
900 eighth-graders, the margin of error for a 95% 


confidence interval for px is 


(a) smaller. 
(b) larger. 


(c) either smaller or larger but we can’t say which. 


(a) Use the survey result to give a 99% confidence 
interval for the mean study time of all first-year 
students. 

(b) What condition not yet mentioned must be met 
for your confidence interval to be valid? 

I want more muscle. Young men in North America 
and Europe (but not in Asia) tend to think they need 
more muscle to be attractive. One study presented 


16.20 
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16.21 


16.22 
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200 young American men with 100 images of men 
with various levels of muscle.° Researchers measure 
level of muscle in kilograms per square meter (kg/m?) 
of fat-free body mass. Typical young men have about 
20 kg/m’. Each subject chose two images, one that 
represented his own level of body muscle and one 
that he thought represented “what women prefer.” 
The mean gap between self-image and “what women 
prefer” was 2.35 kg/m’. 

Suppose that the “muscle gap” in the population 
of all young men has a Normal distribution with 
standard deviation 2.5 kg/m2. Give a 90% confi- 
dence interval for the mean amount of muscle young 
men think they should add to be attractive to women. 
(They are wrong: women actually prefer a level close 
to that of typical men.) 


An outlier strikes. There were actually 464 re- 
sponses to the class survey in Exercise 16.19. One 
student claimed to study 10,000 hours per week 
(10,000 is more than the number of hours in a year). 
We know he’s joking, so we left out this value. If we 
did a calculation without looking at the data, we 
would get x = 36.8 hours for all 464 students. Now 
what is the 99% confidence interval for the popula- 
tion mean? (Continue to use a = 8.5.) Compare 
the new interval with that in Exercise 16.19. The 
message is clear: always look at your data, because 
outliers can greatly change your result. 


Explaining confidence. A student reads that a re- 
cent poll finds that a 95% confidence interval for the 
mean ideal weight given by adult American women 
is 139 + 1.4 pounds. Asked to explain the meaning 
of this interval, the student says, “95% of all adult 
American women would say that their ideal weight 
is between 137.6 and 140.4 pounds.” Is the student 
right? Explain your answer. 
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16.23 


16.24 


Explaining confidence. You ask another student 
to explain the confidence interval for mean ideal 
weight described in Exercise 16.22. The student 
answers, “We can be 95% confident that future 
samples of adult American women will say that 
their mean ideal weight is between 137.6 and 140.4 
pounds.” Is this explanation correct? Explain your 
answer. 


Explaining confidence. Here is an explanation 
from the Associated Press concerning one of its 
opinion polls. Explain briefly but clearly in what way 
this explanation is incorrect. 


For a poll of 1,600 adults, the variation due to sam- 
pling error is no more than three percentage points 
either way. The error margin is said to be valid at 
the 95 percent confidence level. This means that, 
if the same questions were repeated in 20 polls, the 
results of at least 19 surveys would be within three 
percentage points of the results of this survey. 


Exercises 16.25 to 16.27 ask you to answer questions from data. 
Assume that the “simple conditions” hold in each case. The exercise 
statements give you the State step of the four-step process. In your 
work, follow the Plan, Solve, and Conclude steps, illustrated in 
Example 16.3 for a confidence interval. 


16.25 


16.26 


Pulling wood apart. How heavy a load (pounds) is 
needed to pull apart pieces of Douglas fir 4 inches 
long and 1.5 inches square? Here are data from stu- 


dents doing a laboratory exercise: £)),) woop 
33,190 31,860 32,590 26,520 33,280 
325320) 333020 322030: 30,460 32,700 
23,040 30,930 32,720 33,650 32,340 
24,050 30,170 31,300 28,730 31,920 


(a) We are willing to regard the wood pieces prepared 
for the lab session as an SRS of all similar pieces of 
Douglas fir. Engineers also commonly assume that 
characteristics of materials vary Normally. Make a 
graph to show the shape of the distribution for these 
data. Does it appear safe to assume that the Normal- 
ity condition is satisfied? Suppose that the strength of 
pieces of wood like these follows a Normal distribution 
with standard deviation 3000 pounds. 

(b) Give a 95% confidence interval for the mean 
load required to pull the wood apart. 


Bone loss by nursing mothers. Breastfeeding 
mothers secrete calcium into their milk. Some of 
the calcium may come from their bones, so moth- 
ers may lose bone mineral. Researchers measured the 
percent change in mineral content of the spines of 
47 mothers during three months of breastfeeding.’ 
Here are the data: BONELS 
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d7 5 45 07 oe 255 Sage 68 4G (b) Use a 99% confidence interval to estimate the 
’ ‘ ‘ ; , ; ; ‘ j ; mean percent change in the population. 
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16.27 This wine stinks. Sulfur compounds cause “off- 
odors” in wine, so winemakers want to know the odor 
threshold, the lowest concentration of a compound 
that the human nose can detect. The odor threshold 
for dimethyl sulfide (DMS) in trained wine tasters is 
about 25 micrograms per liter of wine (ug/L). The 
untrained noses of consumers may be less sensitive, 


30 30 4 3) MA Sl 2) I) 


(a) Assume that the standard deviation of the 
odor threshold for untrained noses is known to be 
ao = 7 pg/L. Briefly discuss the other two “simple 
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(a) The researchers are willing to consider these conditions,” using a stemplot to verify that the distri- 
47 women to be an SRS from the population of all bution is roughly symmetric with no outliers. 

nursing mothers. Suppose that the percent change (b) Give a 95% confidence interval for the mean 

in this population has standard deviation @ = 2.5%. DMS odor threshold among all students. 

Make a stemplot of the data to verify that the data 16.28 Why are larger samples better? Statisticians prefer 
follow a Normal distribution quite closely. (Don’t large samples. Describe briefly the effect of increas- 
forget that you need both a 0 and a —O stem because ing the size of a sample on the margin of error of a 
there are both positive and negative values.) 95% confidence interval. 


Exploring the Web 


A statistics glossary. An editorial was published in the Journal of the National 
Cancer Institute, Vol. 101, No. 23 (December 2, 2009) that announced some 
online resources for journalists, including a statistics glossary. The glossary can be 
found at www.oxfordjournals.org/our_journals/jnc/resource/ 
statistics%20glossary.pdfE. Read the definition of a confidence inter- 
val. Is this an accurate definition? Explain your answer. 


16.30 Getting around No Child Left Behind. The PBS website has an inter- 
esting article from 2007 discussing how school districts were getting around 
certain requirements of the No Child Left Behind law. You can find the article at 
www.pbs.org/newshour/bb/education-july-dec07-nclb 08-14/. 
What does the article have to say about the use of confidence intervals in 
reporting results about the percent of students passing proficiency tests? 
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In this chapter 


Tests of Significance: we cover 


17.1. The reasoning of tests 
of significance 


The Basics Fe cts ais 


17.3 P-value and statistical 
significance 
17.4 Tests for a population mean 
onfidence intervals are one of the two most common types of statistical infer- Ifo, Significance trom/a table 
ence. Use a confidence interval when your goal is to estimate a population pa- 17.6 Resampling: significance 
rameter. The second common type of inference, called tests of significance, has from a simulation* 
a different goal: to assess the evidence provided by data about some claim concern- 
ing a population parameter. Here is the reasoning of statistical tests in a nutshell. 


EXAMPLE 17.1 I’m a Good Free-Throw Shooter 


| claim that | make 75% of my basketball free throws. To test my claim, you ask me 
to shoot 20 free throws. | make only eight of the 20. “Aha!” you say. “Someone who 
makes 75% of his free throws would almost never make only eight out of 20. So | 
don’t believe your claim.” 

Your reasoning is based on asking what would happen if my claim were true and we 
repeated the sample of 20 free throws many times—| would almost never make as few 
as eight. This outcome is so unlikely that it gives strong evidence that my claim is not true. 

You can say how strong the evidence against my claim is by giving the probability 
that | would make as few as eight out of 20 free throws if | really make 75% in the 
long run. This probability is 0.0009. | would make as few as eight of 20 only nine 
times in 10,000 tries in the long run if my claim to make 75% were true. The small 
probability convinces you that my claim is false. m 
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The Reasoning of a Statistical Test applet animates Example 17.1. You can ask a 
player to shoot free throws until the data do (or don’t) convince you that he makes 
fewer than 75%. Significance tests use an elaborate vocabulary, but the basic idea is 
simple: an outcome that would rarely happen if a claim were true is good evidence that 
the claim is not true. 


The reasoning of statistical tests, like that of confidence intervals, is based on 
asking what would happen if we repeated the sample or experiment many times. We 
will act as if the “simple conditions” listed on page 374 are true: we have a perfect 
SRS from an exactly Normal population with standard deviation 0 known to us. 
Here is an example we will explore. 


Diet colas use artificial sweeteners to avoid sugar. These sweeteners gradually lose 
their sweetness over time. Manufacturers therefore test new colas for loss of sweet- 
ness before marketing them. Trained tasters sip the cola along with drinks of standard 
sweetness and score the cola on a “sweetness score” of 1 to 10, with larger scores 
corresponding to greater sweetness. The cola is then stored for a month at high 
temperature to imitate the effect of four months’ storage at room temperature. Each 
taster scores the cola again after storage. This is a matched pairs experiment. Our 
data are the differences (score before storage minus score after storage) in the tasters’ 
scores. The bigger these differences, the bigger the loss of sweetness. 

Suppose we know that, for any cola, the sweetness loss scores vary from taster 
to taster according to a Normal distribution with standard deviation @ = 1. The mean 
pb for all tasters measures loss of sweetness and is different for different colas. 

Here are the sweetness losses for a cola currently on the market, as measured by 
10 trained tasters: 


1.604 O05 —2.0 1.5 —Wl 1.3 —-O1 -—O3 “12 
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The average sweetness loss is given by the sample mean x = 0.3, so that on aver- 
age the 10 tasters found a small loss of sweetness. Also, more than half (six) of the 
tasters found a loss of sweetness. Are these data good evidence that the cola lost 
sweetness in storage? m 


The reasoning is the same as in Example 17.1. We make a claim and ask if the 
data give evidence against it. We seek evidence that there is a sweetness loss, so the 
claim we test is that there is not a loss. In that case, the mean loss for the population 
of all trained testers would be pw = 0. 


e Ifthe claim that = 0 is true, the sampling distribution of x from 10 tasters 
is Normal with mean w = O and standard deviation 


o 1 


Vn VIO 


« This is just like the calculations we did in Chapter 15 (see Example 15.5 on 
page 354) and Chapter 16 (see Example 16.1 on page 375). Figure 17.1 shows 
this sampling distribution. We can judge whether any observed x is surprising 
by locating it on this distribution. 


= 0.316 
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FIGURE 17.1 

Sampling distribution If the cola does not lose sweetness in 
as of X when =O storage, the mean score X for 10 tasters 
will have this sampling distribution. The 
actual result for one cola was x = 0.3. 
That could easily happen just by chance. 
Another cola had X = 1.02. That's so 
far out on the Normal curve that it is 
good evidence that this cola did lose 
sweetness. 


For this cola, 10 tasters had mean loss x = 0.3. It is clear from Figure 17.1 
that an x this large is not particularly surprising. It could easily occur just by 
chance when the population mean is w = 0. That 10 tasters found x = 0.3 is 
not strong evidence that this cola loses sweetness. 


Here are the sweetness losses for a new cola, as measured by 10 trained tasters: 
20 04 0.7 20 -04 2.2 -13 1.2 1.1 2.3 = 


The average sweetness loss is given by the sample mean x = 1.02. Most scores are 
positive. That is, most tasters found a loss of sweetness. But the losses are small, 
and two tasters (the negative scores) thought the cola gained sweetness. Are these 
data good evidence that the cola lost sweetness in storage? m 


The taste test for the new cola produced x = 1.02. That’s way out in the tail of 
the Normal curve in Figure 17.1—so far out that an observed value this large would 
rarely occur just by chance if the true were 0. This observed value is good evidence 
that the true yp is in fact greater than 0, that is, that the cola lost sweetness. The 
manufacturer must reformulate the new cola and try again. 


Apply Your Knowledge 


17.1 Student Attitudes. The Survey of Study Habits and Attitudes (SSHA) is 
a psychological test that measures students’ study habits and attitude toward 
school. The survey yields several scores, one of which measures student at- 
titudes toward studying. The mean student attitude score for college students 
is about 50, and the standard deviation is about 15. A researcher in the Phil- 
ippines is concerned about the declining performance of college graduates on 
professional licensure and board exams. She suspects that poor attitudes of 
students are partly responsible for the decline and that the mean for college 
seniors who plan to take professional licensure or board exams is less than 50. 
She gives the SSHA to an SRS of 225 college seniors in the Philippines who 
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plan to take professional licensure or board exams. Suppose we know that 
the student attitude scores in the population of such students are Normally 
distributed with standard deviation 0 = 15. 


(a) We seek evidence against the claim that w = 50. What is the sampling 
distribution of the mean score x of a sample of 225 students if the 
claim is true? Draw the density curve of this distribution. (Sketch a 
Normal curve, then mark on the axis the values of the mean and one, 
two, and three standard deviations of the sampling distribution on 
either side of the mean.) 


(b) Suppose that the sample data give x = 48.9. Mark this point on the 
axis of your sketch. In fact, the result was x = 47.4. Mark this point 
on your sketch. Using your sketch, explain in simple language why one 
result is good evidence that the mean score of all college seniors in the 
Philippines who plan to take professional licensure or board exams is 
less than 50 and why the other outcome is not. 


17.2 Measuring Conductivity. The National Institute of Standards and 
Technology (NIST) supplies a “standard iron rod” whose electrical conductiv- 
ity is supposed to be exactly 10.1. Is there reason to think that the true con- 
ductivity is not 10.1? To find out, NIST measures the conductivity of one rod 
six times. Repeated measurements of the same thing vary, which is why NIST 
makes six measurements. These measurements are an SRS from the popula- 
tion of all possible measurements. This population has a Normal distribution 
with mean p equal to the true conductivity and standard deviation 0 = 0.1. 


(a) We seek evidence against the claim that w = 10.1. What is the sam- 
pling distribution of the mean x in many samples of six measurements 
of one rod if the claim is true? Make a sketch of the Normal curve for 
this distribution. (Draw a Normal curve, then mark on the axis the 
values of the mean and one, two, and three standard deviations on 
either side of the mean.) 

(b) Suppose that the sample mean is x = 10.09. Mark this value on the 
axis of your sketch. Another rod has x = 9.95 for six measurements. 
Mark this value on the axis as well. Explain in simple language why 
one result is good evidence that the true conductivity differs from 10.1 
and why the other result gives no reason to doubt that 10.1 is correct. 


A statistical test starts with a careful statement of the claims we want to compare. 
In Example 17.3, we saw that the taste test data are not plausible if the new cola loses 
no sweetness. Because the reasoning of tests looks for evidence against a claim, we 
start with the claim we seek evidence against, such as “no loss of sweetness.” 


Null and Alternative Hypotheses 


The claim tested by a statistical test is called the null hypothesis. The test is de- 
signed to assess the strength of the evidence against the null hypothesis. Usually the 
null hypothesis is a statement of “no effect” or “no difference.” 


The claim about the population that we are trying to find evidence for is the alterna- 
tive hypothesis. The alternative hypothesis is one-sided if it states that a param- 
eter is larger than or smaller than the null hypothesis value. It is two-sided if it states 
that the parameter is different from the null value (it could be either smaller or larger). 
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We abbreviate the null hypothesis as Hy and the alternative hypothesis as H,. 
Hypotheses always refer to a population parameter, not to a particular sample 
outcome. Be sure to state Hy and H, in terms of population parameters. Because 
H, expresses the effect that we hope to find evidence for, it is sometimes 

easier to begin by stating H, and then set up Hy as the statement that the hoped-for 
effect is not present. 

In Examples 17.2 and 17.3, we are seeking evidence for loss in sweetness. The 

null hypothesis says “no loss” on the average in a large population of tasters. The 
alternative hypothesis says “there is a loss.” So the hypotheses are 


Ho: wp = 0 
Hy: w > 0 


The alternative hypothesis is one-sided because we are interested only in whether 
the cola lost sweetness.' 


Does the job satisfaction of assembly workers differ when their work is machine- 
paced rather than self-paced? Assign workers either to an assembly line moving at 
a fixed pace or to a self-paced setting. All subjects work in both settings, in random 
order. This is a matched pairs design. After two weeks in each work setting, the work- 
ers take a test of job satisfaction. The response variable is the difference in satisfac- 
tion scores, self-paced minus machine-paced. 

The parameter of interest is the mean pw of the differences in scores in the popu- 
lation of all assembly workers. The null hypothesis says that there is no difference 
between self-paced and machine-paced work, that is, 


Ho: w= O 
The authors of the study wanted to know if the two work conditions have different 


levels of job satisfaction. They did not specify the direction of the difference. The 
alternative hypothesis is therefore two-sided: 


H,: w #O = 


The hypotheses should express the hopes or suspicions we have before we 

A see the data. It is cheating to first look at the data and then frame hypotheses to 

fit what the data show. For example, the data for the study in Example 17.4 

showed that the workers were more satisfied with self-paced work, but this should 

not influence the choice of H,. If you do not have a specific direction firmly in mind 
in advance, use a two-sided alternative. 


(4) LaunchPad Online Resources 


* The StatBoards video, Formulating Hypotheses, provides examples of 
formulating null and alternative hypotheses. It will help you understand what 
type of hypothesis is used for what type of question. 


HONEST 
ik HYPOTHESES? 


Chinese and Japanese people, for 
whom the number 4 is unlucky, die 
more often on the fourth day of 

the month than on other days. The 
authors of a study did a statistical 
test of the claim that the fourth day 
has more deaths than other days and 
found good evidence in favor of this 
claim. Can we trust this? Not if the 
authors looked at all days, picked the 
one with the most deaths, then made 
“this day is different” the claim to 

be tested. A critic raised that issue, 
and the authors replied: “No, we had 
day 4 in mind in advance, so our test 
was legitimate.” 


Apply Your Knowledge 


17.3 Student Attitudes. State the null and alternative hypotheses for the study 
of college seniors in the Philippines’ attitudes described in Exercise 17.1. (Is 
the alternative hypothesis one-sided or two-sided?) 
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17.4 Measuring Conductivity. State the null and alternative hypotheses for 
the study of electrical conductivity described in Exercise 17.2. (Is the alter- 
native hypothesis one-sided or two-sided?) 

17.5 Too Early. The examinations in a large multi-section statistics class are 
scaled after grading so that the mean score is 75. The professor thinks that 
students in the 8:00 A.M. class have trouble paying attention because they are 
sleepy and suspects that these students have a lower mean score than the class 
as a whole. The students in the 8:00 A.M. class this semester can be considered 
a sample from the population of all students in the course, so the professor 
compares their mean score with 75. State the hypotheses Hy and H,. 

17.6 Women’s Incomes. The average income of American women who work 
full-time and have only a high school degree is $33,230. You wonder whether 
the mean income of female graduates from your local high school who work 
full-time but have only a high school degree is different from the national 
average. You obtain income information from an SRS of 62 female gradu- 
ates who work full-time and have only a high school degree and find that 
x = $32,052. What are your null and alternative hypotheses? 

17.7 Stating Hypotheses. In planning a study on the number of days in the last 
30 that high school students texted while driving sometime during the day, a 
researcher states the hypotheses as 


Hy: x = 15 days 
Fx = > idays 


What’s wrong with this? 


The idea of stating a null hypothesis that we want to find evidence against seems 
odd at first. It may help to think of a criminal trial. The defendant is “innocent 
until proven guilty.” That is, the null hypothesis is innocence and the prosecution 
must try to provide convincing evidence against this hypothesis. That’s exactly how 
statistical tests work, though in statistics we deal with evidence provided by data 
and use a probability to say how strong the evidence is. 

The probability that measures the strength of the evidence against a null hypoth- 
esis is called a P-value. Statistical tests generally work like this: 


Test Statistic and P-Value 


A test statistic calculated from the sample data measures how far the data diverge 
from what we would expect if the null hypothesis Hy were true. Large values of the 
statistic show that the data are not consistent with Hp. 

The probability, computed assuming that H) is true, that the test statistic would take 
a value as extreme or more extreme than that actually observed is called the P-value 
of the test. The smaller the P-value, the stronger the evidence against Hy provided 
by the data. 


Small P-values are evidence against Hp because they say that the observed result 
would be unlikely to occur if Hp were true. Large P-values fail to give evidence 
against Hp. Statistical software will give you the P-value of a test when you enter 
your null and alternative hypotheses and your data. So your most important task is 
to understand what a P-value says. 
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The study of sweetness loss in Examples 17.2 and 17.3 tests the hypotheses 
Ho: w= O 
H,: w > O 


Because the alternative hypothesis says that ~ > O, values of x greater than O favor H, 
over Ho. The test statistic compares the observed X with the hypothesized value pw = O. 
For now, let's concentrate on the P-value. The experiment presented in Examples 17.2 and 
17.3 actually compared two colas. For the first cola, the 10 tasters found mean sweetness 
loss X = 0.3. For the second, the data gave x = 1.02. The P-value for each test is the 
probability of getting an x this large when the mean sweetness loss is really w = O. 

The shaded area in Figure 17.2 shows the P-value when X = 0.3. The Normal 
curve is the sampling distribution of X when the null hypothesis Ho: w = O is 
true. A Normal probability calculation (Exercise 17.8) shows that the P-value is 
P(x = 0.3) = 0.1714. 


Ho: w= 0 


Has @u>O0 Sample Mean =0.3 
Opu<o P-value = 0.1714 
Ouzod 

o=|1 

n= 10 


© I have data, and the 
observed x= 0.3 | 


© The truth about the 
population is u=|0 
UPDATE 


NAGS —0.632 0.000 0.632 1.265 


RESET 
Sa 


FIGURE 17.2 

The one-sided P-value for the cola with mean sweetness loss x = 0.3 in Example 17.4. The figure shows 
both the input and the output for the P-Value of a Test of Significance applet. Note that the P-value is the 
shaded area under the curve, not the unshaded area. 


A value as large as X = 0.3 would occur just by chance in 17% of all samples 
when Ho: w = O is true. So observing x = 0.3 is not strong evidence against Ho. On 
the other hand, you can calculate that the probability that x is 1.02 or larger when 
in fact w = O is only 0.0006. We would very rarely observe a mean sweetness loss 
of 1.02 or larger if Ho were true. This small P-value provides strong evidence against 
Ho and in favor of the alternative H,: 4 > O. = 


Figure 17.2 is actually the output of the P-Value of a Test of Significance applet, 
along with the information we entered into the applet. This applet automates the 
work of finding P-values for samples of size 50 or smaller under the “simple condi- 
tions” for inference about a mean. 

The alternative hypothesis sets the direction that counts as evidence against 
Hg. In Example 17.5, only large positive values count because the alternative is 
one-sided on the high side. If the alternative is two-sided, both directions count. 
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The study of job satisfaction in Example 17.4 requires that we test 
Ho: w= O 
Ha: bw #O 


Suppose we know that differences in job satisfaction scores (self-paced minus machine- 
paced) in the population of all workers follow a Normal distribution with standard 
deviation o = 60. 

Data from 18 workers give x = 17. That is, these workers prefer the self-paced 
environment on the average. Because the alternative is two-sided, the P-value is 
the probability of getting an x at least as far from fs = O in either direction as the 
observed x = 17. 

Enter the information for this example into the P-Value of a Test of Significance 
applet and click “Show P.” Figure 17.3 shows the applet output as well as the infor- 
mation we entered. The P-value is the sum of the two shaded areas under the Normal 
curve. It is P = 0.2293. Values as far from O as x = 17 (in either direction) would 
happen 23% of the time when the true population mean is w~ = O. An outcome that 
would occur so often when Hp is true is not good evidence against Ho. m 


Ho: w= 0 


Ha: OQu>od Sample Mean =17 
Op<o0 P-value = 0.2293 
@Qwsd 

o= 60 

n= 18 


© I have data, and the 
observed X= 17 


© The truth about the 
population is p = 


Hi 


UPDATE 


—56.569 —28.284 0.000 28.284 56.569 


—..- Koei mm 


RESET 


FIGURE 17.3 
The two-sided P-value for Example 17.6. The figure shows both the input and the output for the P-Value of a 
Test of Significance applet. Note that the P-value is the shaded area under the curve, not the unshaded area. 


The conclusion of Example 17.6 is not that Hp is true. The study looked for 
evidence against Ho: w = O and failed to find strong evidence. That is all we can 
say. No doubt the mean pw for the population of all assembly workers is not exactly 
equal to 0. A large enough sample would give evidence of the difference, even if it 
is very small. Tests of significance assess the evidence against Hp. If the evidence 
is strong, we can confidently reject Hp in favor of the alternative. Failing to find evi- 

dence against Hy means only that the data are not inconsistent with Ho, not that 
A we have clear evidence that Hp is true. Only data that are inconsistent with 
Hp provide evidence against Hp. 

In Examples 17.5 and 17.6, we decided that P-value P = 0.0006 was strong evi- 

dence against the null hypothesis and that P-values P = 0.1714 and P = 0.2293 


17.3 P-Value and Statistical Significance 


did not give convincing evidence. There is no rule for how small a P-value we 
should require to reject Hp—it’s a matter of judgment and depends on the specific 
circumstances. 

Nonetheless, we can compare a P-value with some fixed values that are in common 
use as standards for evidence against Hj. The most common fixed values are 0.05 and 
0.01. If P = 0.05, there is no more than one chance in 20 that a sample would give 
evidence this strong just by chance when Hp is actually true. If P < 0.01, we have 
a result that in the long run would happen no more than once per 100 samples if Hy 
were true. These fixed standards for P-values are called significance levels. We use 
a, the Greek letter alpha, to stand for a significance level. 


Statistical Significance 


If the P-value is as small as or smaller than @, we say that the data are statistically 
significant at level a. The quantity a is called the significance level or the level of 
significance. 


“Significant” in the statistical sense does not mean “important.” It means sim- 
A ply “not likely to happen just by chance.” The significance level a makes “not 
likely” more exact. Significance at level 0.01 is often expressed by the state- 
ment, “The results were significant (P < 0.01).” Here P stands for the P-value. The 
actual P-value is more informative than a statement of significance because it allows 
us to assess significance at any level we choose. For example, a result with P = 0.03 
is significant at the a = 0.05 level but is not significant at the a = 0.01 level. To 
avoid confusion, we will use “statistically significant” rather than just “significant” 
in this chapter. However, in research papers and media publications, you will often 
see the word “significant” rather than the phrase “statistically significant.” In later 
chapters we will use both. 

It is good practice to interpret findings of statistical significance in the context of 
the problem for which data were collected. For example, in Example 17.5, statistical 
significance implies something about the loss of sweetness in diet colas. A sample 
mean of x = 0.3 is not statistically significant at the a = 0.05 level. We would 
interpret this as meaning that our data did not provide strong evidence that diet 
cola, on average, loses sweetness after being stored one month at high temperature. 


(4) LaunchPad Online Resources 


* The StatClips video, P-value Interpretation, provides examples of interpreting 
P-values. 
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significance levels 


TQ siGNIFICANCE 
Lik STRIKES DOWN 
A NEW DRUG 


The pharmaceutical company 

Pfizer spent $1 billion developing a 
new cholesterol-fighting drug. The 
final test for its effectiveness was a 
clinical trial with 15,000 subjects. 
To enforce double-blindness, only 
an independent group of experts 
saw the data during the trial. Three 
years into the trial, the monitors 
declared that there was a statistically 
significant excess of deaths and of 
heart problems in the group assigned 
to the new drug. Pfizer ended the 
trial. There went $1 billion. 


Apply Your Knowledge 


17.8 Sweetening Colas: Find the P-Value. The P-value for the first cola in 
Example 17.5 is the probability (taking the null hypothesis 4 = 0 to be 
true) that x takes a value at least as large as 0.3. 


(a) What is the sampling distribution of x when ww = 0? This distribution 
appears in Figure 17.2. 

(b) Doa Normal probability calculation to find the P-value. Your result 
should agree with Example 17.5 up to roundoff error. 

Job Satisfaction: Find the P-Value. The P-value in Example 17.6 is 

the probability (taking the null hypothesis 4 = 0 to be true) that x takes a 

value at least as far from O as 17. 


17.9 
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17.10 


eS 17.11 


eS 17.12 


(a) What is the sampling distribution of x when ys = 0? This distribution 
is shown in Figure 17.3. 


(b) Do aNormal probability calculation to find the P-value. (Recall that 
the alternative hypothesis is two-sided.) Your result should agree with 
Example 17.6 up to roundoff error. 


Lorcaserin and Weight Loss. A double-blind, randomized comparative 
experiment compared the effect of the drug lorcaserin and a placebo on 
weight loss in overweight adults. All subjects also underwent diet and 
exercise counseling. The study reported that, after one year, patients in the 
lorcaserin group had an average weight loss of 5.8 kilograms (kg), while 
those on the placebo had an average weight loss of 2.2 kg (P < 0.001).? 
Explain to someone who knows no statistics why these results mean that 
there is good reason to think that lorcaserin works. Include an explanation 
of what P < 0.001 means. 


Student Attitudes. Exercise 17.1 describes a study of the attitudes of col- 
lege seniors in the Philippines. You stated the null and alternative hypoth- 
eses in Exercise 17.3. 


(a) One sample of 225 students had mean student attitude score 
x = 48.9. Enter this x, along with the other required information, 
into the P-Value of a Test of Significance applet. What is the P-value? 
Is this outcome statistically significant at the a = 0.05 level? At the 
a = 0.01 level? 


(b) Another sample of 225 students had x = 47.4. Use the applet to 
find the P-value for this outcome. Is it statistically significant at the 
a = 0.05 level? At the a = 0.01 level? 


(c) Explain briefly why these P-values tell us that one outcome is strong 
evidence against the null hypothesis and that the other outcome is not. 


Measuring Conductivity. Exercise 17.2 describes six measurements of 
the electrical conductivity of an iron rod. You stated the null and alterna- 
tive hypotheses in Exercise 17.4. 


(a) One set of measurements has mean conductivity x = 10.09. Enter 
this x, along with the other required information, into the P-Value 
of a Test of Significance applet. What is the P-value? Is this outcome 
statistically significant at the a = 0.05 level? At the a = 0.01 level? 


(b) Another set of measurements has x = 9.95. Use the applet to find 
the P-value for this outcome. Is it statistically significant at the 
a = 0.05 level? At the a = 0.01 level? 

(c) Explain briefly why these P-values tell us that one outcome is strong 
evidence against the null hypothesis and that the other outcome is 
not. 


We have used tests for hypotheses about the mean p of a population, under the 


“simple conditions,” to introduce tests of significance. The big idea is the reasoning 


of a test: data that would rarely occur if the null hypothesis Hy were true provide evi- 


dence that Hp is not true. The P-value gives us a probability to measure “would rarely 
occur.” In practice, the steps in carrying out a test of significance mirror the overall 


four-step process for organizing realistic statistical problems. 
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Tests of Significance: The Four-Step Process 


STATE: What is the practical question that requires a statistical test? 


PLAN: Identify the parameter, state null and alternative hypotheses, and choose the 
type of test that fits your situation. 


SOLVE: Carry out the test in three phases: 

1. Check the conditions for the test you plan to use. 
2. Calculate the test statistic. 

3. Find the P-value. 


CONCLUDE: Return to the practical question to describe your results in this setting. 


Once you have stated your question, formulated hypotheses, and checked the 
conditions for your test, you or your software can find the test statistic and P-value 
by following a rule. Here is the rule for the test we have used in our examples. 


z Test for a Population Mean 


Draw an SRS of size n from a Normal population that has unknown mean p and known 
standard deviation o. To test the null hypothesis that w has a specified value, 


Ho: 2 = bo 

calculate the one-sample z test statistic 
R= AUS 
a/Vn 


In terms of a variable Z having the standard Normal distribution, the P-value for a 
test of Hy against 


ise > fim is Ie ex) 


A= fa is PZ = 2) _ eo 


Hy: F bo is = 2P(Z=|z)) 


As promised, the test statistic y measures how far the observed sample mean X 
deviates from the hypothesized population value fy. The measurement is in the 
familiar standard scale obtained by dividing by the standard deviation of x. So we 
have a common scale for all z tests, and the 68—95—99.7 rule helps us see at once if 
x is far from fo. The pictures that illustrate the P-value look just like the curves in 
Figures 17.2 and 17.3, except that they are in the standard scale. 
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§STATE®) The National Center for Health Statistics reports that the systolic blood pres- 
sure for males 35 to 44 years of age has mean 128 and standard deviation 15. The 
medical director of a large company looks at the medical records of 72 executives 
in this age group and finds that the mean systolic blood pressure in this sample is 
X = 126.07. Is this evidence that the company’s executives have a different mean 
systolic blood pressure from the general population? 


PLAN: The null hypothesis is “no difference” from the national mean uo = 128. The 
alternative is two-sided because the medical director did not have a particular direc- 
tion in mind before examining the data. So the hypotheses about the unknown mean 
pe of the executive population are 


Ho: w= 128 

H,: w #128 
We know that the one-sample z test is appropriate for these hypotheses under the 
“simple conditions.” 


ESQ REA As part of the “simple conditions,” suppose we are willing to assume that 
executives’ systolic blood pressures follow a Normal distribution with standard devia- 
tion o = 15. Software can now calculate z and P for you. Going ahead by hand, the 
test statistic is 


X— bo 126,07 — 128 
a/Vn 15/V72 
= —1.09 


Z 


To help find the P-value, sketch the standard Normal curve and mark on it the 
observed value of z. Figure 17.4 shows that the P-value is the probability that a 
standard Normal variable Z takes a value at least 1.09 away from zero. From Table A 
or software, this probability is 


P = 2P(Z < —1.09) = (2)(0.1379) = 0.2758 


FSOMSRD More than 27% of the time, an SRS of size 72 from the general male 
population would have a mean systolic blood pressure at least as far from 128 as that 
of the executive sample. The observed xX = 126.07 is therefore not good evidence 
that executives differ from other men. m 


are true. In practice, you must verify these conditions. 


In this chapter, we are acting as if the “simple conditions” stated on page 374 
g g 


1. SRS: The most important condition is that the 72 executives in the sample 
are an SRS from the population of all middle-aged male executives in the 
company. We should check this requirement by asking how the data were 
produced. If medical records are available only for executives with recent 
medical problems, for example, the data are of little value for our purpose 
because of the obvious health bias. It turns out that all executives are given 
a free annual medical exam, and that the medical director selected 72 exam 
results at random. 

2. Normal distribution: We should also examine the distribution of the 72 ob- 
servations to look for signs that the population distribution is not Normal. 


3. Known a: It really is unrealistic to suppose that we know that 0 = 15. We will 
see in Chapter 20 that it is easy to do away with the need to know o. 


17.4 Tests for a Population Mean 


FIGURE 17.4 

The P-value for the two-sided test in 
Example 17.7. The observed value of the 
test statistic is z = —1.09. 


The two-sided P-value for 
2 = —1.09 Is the area at least 


1.09 away from 0 in either 
direction, P = 0.2758. 


Standard Normal 
curve 


Area = 0.1379 
9 


—1.09 0) 1.0 


Area = 0.1379 


(3) LaunchPad Online Resources 


* The StatBoards video, Calculating the Test Statistic and Finding the P-value, 
One Tail, provides examples of the four-step process for one tail tests for 
population means. 


* The StatBoards video, Calculating the Test Statistic and Finding the P-value, 
Two Tail, provides examples of the four-step process for two tail tests for 
population means. 


* The StatBoards video, Interpreting the Result of a Test in Context, provides 
examples of the Conclude step of the four-step process in tests for population 
means. 


Apply Your Knowledge 


17.13 The z Statistic. Published reports of research work are terse. They often 
report just a test statistic and P-value. For example, the conclusion of 


Example 17.7 might be stated as “(zy = —1.09, P = 0.2758).” Find the 
values of the one-sample z statistic needed to complete these conclusions: 
(a) For the first cola in Example 17.5, y = ?, P = 0.1714. 

(b) For the second cola in Example 17.5, x = ?, P = 0.0006. 

(c) For Example 17.6, z = ?, P = 0.2293. 


17.14 Measuring Conductivity. Here are six measurements of the electrical 
conductivity of an iron rod: ely COND 


10.08 9.89 10.05 10.16 10.21 10.11 


The iron rod is supposed to have conductivity 10.1. Do the measurements 
give good evidence that the true conductivity is not 10.1? 

The six measurements are an SRS from the population of all results we 
would get if we kept measuring conductivity forever. This population has 
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a Normal distribution with mean equal to the true conductivity of the rod 
and standard deviation 0.1. Use this information to carry out a test, follow- 
ing the four-step process as illustrated in Example 17.7. 

17.15 Bad Weather, Bad Tip? People tend to be more generous after receiving 
good news. Are they less generous after receiving bad news? The average tip left 
by adult Americans is 20%. Give 20 patrons of a restaurant a message on their 
bill warning them that tomorrow’s weather will be bad and record the percent- 
age tip they leave. Here are the tips as a percentage of the total bill: TIPS 


18.0 19.1 19.2 18.8 184 19.0 18.5 16.1 16.8 18.2 
14.0 17.0 13.6 17.5 20.0 20.2 18.8 18.0 23.2 19.4 


Suppose that percentage tips are Normal with @ = 2. Is there good evidence 
that the mean percentage tip is less than 20? Follow the four-step process as 
illustrated in Example 17.7. 


Statistics in practice uses technology (graphing calculator or software) to get 
P-values quickly and accurately. In the absence of suitable technology, you can get 
approximate P-values quickly by comparing the value of your test statistic with criti- 
cal values from a table. For the z statistic, the table is Table C, the same table we 
used for confidence intervals. 

Look at the bottom row of critical values in Table C, labeled 7*. At the top of the 
table, you see the confidence level C for each x*. At the bottom of the table, you 
see both the one-sided and two-sided P-values for each z*. Values of a test statistic z 
that are farther out than a z* (in the direction given by the alternative hypothesis) 
are statistically significant at the level that matches z*. 


Significance from a Table of Critical Values 


To find the approximate P-value for any x statistic, compare z (ignoring its sign) with 
the critical values z* at the bottom of Table C. If x falls between two values of z*, the 
P-value falls between the two corresponding values of P in the “One-sided P” or the 
“Two-sided P” row of Table C. 


z* 


2.054 2.326 


One-sided P 


0.02 


0.01 


The z statistic for a one-sided test is z = 2.13. How statistically significant is this 
result? Compare z = 2.13 with the z* row in Table C. 

It lies between z* = 2.054 and z* = 2.326. So the P-value lies between the 
corresponding entries in the “One-sided P” row, which are P = 0.02 and P = 0.01. 
This z is statistically significant at the a = 0.02 level and is not statistically significant 
at the a = 0.01 level. 

Figure 17.5 illustrates the situation. The shaded area under the Normal curve is 
the P-value for z = 2.13. You can see that P falls between the areas to the right of 
the two critical values, for P = 0.02 and P = 0.01. 


“This material can be skipped if you use software to compute P-values. 


17.5 Significance From a Table 


FIGURE 17.5 

Is it significant? The test statistic 
value z = 2.13 falls between 

the critical values required for 
significance at the a = 0.02 and 
a = 0.01 levels. So the test is 
significant at a = 0.02 and is not 
significant at a = 0.01. 


Standard Normal 
curve 


Values of 2 to the 

right of this point Values of z to the 

are significant at right of this point 

a = 0.02. are significant at 
a = 0.01. 


2 = 2.13 
The z statistic in Example 17.7 is z = —1.09. The alternative hypothesis is two- ri 
sided. Compare z = —1.09 (ignoring the minus sign) with the z* row in Table C. 3 1.036 1.282 
It lies between z* = 1.036 and z* = 1.282. So the P-value lies between the match- Two-sided P | 0.30 0.20 


ing entries in the “Two-sided P” row, P = 0.30 and P = 0.20. This is enough to 
conclude that the data do not provide good evidence against the null hypothesis. 


Apply Your Knowledge 


17.16 Significance from a Table. A test of Hy: w = 0 against H,: w > O has 
test statistic y = 1.876. Is this test statistically significant at the 5% level 
(a = 0.05)? Is it statistically significant at the 1% level (a = 0.01)? 

17.17 Significance from a Table. A test of Hp: w = 0 against H,: w ~ O has 
test statistic y = 1.876. Is this test statistically significant at the 5% level 
(a = 0.05)? Is it statistically significant at the 1% level (a = 0.01)? 

17.18 Testing a Random Number Generator. A random number generator 
is supposed to produce random numbers that are uniformly distributed on 
the interval from 0 to 1. If this is true, the numbers generated come from 
a population with w = 0.5 and ao = 0.2887. A command to generate 100 
random numbers gives outcomes with mean x = 0.4365. Assume that the 
population o remains fixed. We want to test 


Ho: w = 0.5 

Hy pw # 0.5 
a) Calculate the value of the z test statistic. 
b) Use Table C: is z statistically significant at the 5% level (a = 0.05)? 
c) Use Table C: is z statistically significant at the 1% level (a = 0.01)? 


d) Between which two Normal critical values z* in the bottom row of 
Table C does z lie? Between what two numbers does the P-value lie? 
Does the test give good evidence against the null hypothesis? 
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In Chapter 15 (page 353) we learned that, for an SRS from a Normal popula- 
tion with mean pw and standard deviation o, the sampling distribution of x is also 
Normal with mean p and standard deviation o/ Vn. In Section 17.4, this fact 
allowed us to use software or a table to compute P-values. Knowledge of the sam- 
pling distribution of x computed from SRSs when the null hypothesis is true is the 
key to finding P-values because this allows us to carry out probability calculations. 

However, we also saw in Section 15.3 that we can approximate the sampling 
distribution of x by taking a very large number of SRSs of size n and construct- 
ing the histogram of the values of the sample means, x. This was done in Example 
15.4. Figure 15.2 (page 351) approximates the sampling distribution of x by plotting 
the results of many (1000) samples of size 10 from a Normal population with 
mean wt = 25 and standard deviation o = 7. Can we estimate P-values from such 
a histogram? Here is an example demonstrating how this can be done. 


FIGURE 17.6 

The histogram of the means of 
1000 SRSs of size 10 from a 
Normal population with mean 
be = O and standard deviation 
o = 1 for Example 17.9. 


In Examples 17.2, 17.3, and 17.5, we were interested in whether there was evidence 
of any sweetness loss for a diet cola. In particular, we tested the hypotheses 


Ho: w= O 
H,: w > O 


where sweetness loss score varies from one trained taster to another according to a 
Normal distribution with mean yu and standard deviation o@ = 1. Based on the mean 
of the sweetness loss score X of 10 trained testers, we computed P-values when 
X = 0.3 and when x = 1.02. Now we estimate these P-values from the histogram 
of the sample means of 1000 SRSs of size 10 computed (using software) under the 
assumption that the null hypothesis is true. 

Figure 17.6 is a histogram of the means of 1000 SRSs of size 10 from a Normal 
population with mean ww = O and standard deviation o = 1. We find that 163 of the 
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*This material can be skipped if you prefer not to cover resampling methods. 


17.6 Resampling: Significance from a Simulation 


samples produced a mean x of 0.3 or more. Only two samples produced a mean X of 
1.02 or more. Thus, based on the histogram, we would estimate the P-value of 0.3 to 
be 163/1000 = 0.163 and the P-value of 1.02 to be 2/1000 = 0.002. Compare 
these estimates to the exact P-values of 0.1714 and 0.0006 that we computed in 
Example 17.5. The simulated values are reasonably good approximations (and would 
be even better with more samples of size 10). m 
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This method of taking a large number of repeated SRSs from the population 
distribution when the null hypothesis is true and using these to approximate 
P-values is sometimes referred to as resampling.* It is a powerful method for esti- 
mating P-values and can be applied to tests of hypotheses based on test statistics 
other than sample means. All we need to know is the population distribution 
under the assumption that the null hypothesis is true. We then resample, using 
software, many times from this population distribution, compute the value of the 
sample statistic for each sample, and determine the proportion of times we ob- 
tained sample values as or more extreme than our actual data. This proportion is 
an estimate of the P-value. The more times we resample, the better our estimate 
will be. The next example illustrates the method using the sample median as a 
test statistic. 


In Example 15.10 (page 363), we speculated that adults aged 65 and older are 
less able to detect DMS than other adults. To test this, we selected an SRS of five 
adults from the population of all adults aged 65 and older. We presented each 
subject both natural wine and the same wine spiked with DMS at different con- 
centrations to find the lowest concentration (odor threshold) at which they identify 
the spiked wine. The median of the odor thresholds of the five subjects was 35 
micrograms per liter. 

Our null hypothesis is that the distribution of odor thresholds for adults aged 65 
and older is the same as that for all adults, namely, it is Normal with mean uw = 
25 micrograms per liter and standard deviation a = 7 micrograms per liter. If adults 
aged 65 and older are less able to detect DMS than other adults, their mean odor 
threshold should be larger than 25. Thus, our alternative hypothesis is u~ > 25 and 
our hypotheses are 


Ho: be = 25 
Hy pw > 25 


Recall that the mean and median of a normal population are the same. Thus, the 
sample median of the odor thresholds of the five subjects provides information about 
the population mean jw and we can use the median of an SRS of size 5 as our test 
statistic. To determine whether the sample median of 35 is evidence against Hy we 
need to know the sampling distribution of a sample median of size 5 computed from 
an SRS from a Normal population with mean jw = 25 and standard deviation 0 = 7. 
From this sampling distribution, we compute the P-value. The P-value is the probability, 
computed under the assumption that Hp is true, that the test statistic (in this case, the 
sample median) would take a value as large as or larger than that actually observed 
(in this case, 35). Unlike the sample mean, we haven’t discussed any mathematical 
results that tell us what this sampling distribution is. However, we simulated this sam- 
pling distribution in Example 15.9. Figure 17.7 reproduces this simulation of medians 
based on 1000 samples of size 5 from a Normal population with mean = 25 and 
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FIGURE 17.7 

The histogram of the medians of 
1000 SRSs of size 5 from a Normal 
population with mean w = 25 

and standard deviation o = 7 for 
Example 17.10. 


110 4 


100 - 


Number of samples 


Eight samples have 
median 35 or larger 


10 15 20 25 30 35 40 45 


standard deviation o = 7. The red bars correspond to samples with medians =35. 
There are eight such samples, so we estimate the P-value to be 8/1000 = 0.008. 
This is strong evidence against the null hypothesis and makes precise our previous 
statement in Example 15.10 that a median of 35 is statistically significant. In other 
words, these data provide strong evidence that adults aged 65 and older are less able 
to detect DMS than other adults. = 


We close with a few remarks. First, we must resample in the same manner as we 
obtain our data. If our data are obtained by an SRS, we resample by taking repeated 
SRSs from the population distribution determined by the null hypothesis. 

Second, resampling only provided an estimate of a P-value. Repeat the resampling 
and you will obtain a different estimate. Accuracy of the estimate is improved by 
taking a larger number of samples to estimate the sampling distribution. 

Finally, resampling requires the use of software. Some software packages are more 
convenient for resampling than others. 


(3) LaunchPad Online Resources 


* The Video Technology Manuals discuss how to do resampling for several 
software packages. 


Apply Your Knowledge 


17.19 Does This Wine Smell Bad? In Example 17.10, suppose the median of the 
odor thresholds of the sample of five older adults had been 33. Use Figure 17.7 
to estimate the P-value for testing 

lates w= 25) 
Jobaat => 725) 


What would you estimate the P-value to be if the sample median had been 30? 


17.20 Sweetening Colas, a Two-Sided P-Value. Suppose in Example 17.9 
we were interested in whether there was a change in sweetness. In this case, 
we would test 

lake oo 0 
H,: w #0 
Use Figure 17.6 to estimate the P-value if the mean of the scores of 10 tasters is 


x = 1.02. Remember, this involves a two-sided P-value, so you need to deter- 
mine the proportion of the 1000 sample means that are = 1.02 or = —1.02. 


CHAPTER 17 SUMMARY 


Chapter Specifics 


A test of significance assesses the evidence provided by data against a null hypothesis 
Hp in favor of an alternative hypothesis H,,. 


Hypotheses are always stated in terms of population parameters. Usually Hp is a 
statement that no effect is present, and H, says that a parameter differs from its null 
value in a specific direction (one-sided alternative) or in either direction (two-sided 
alternative). 


The essential reasoning of a test of significance is as follows. Suppose for the sake of 
argument that the null hypothesis is true. If we repeated our data production many 
times, would we often get data as inconsistent with Hp as the data we actually have? 
Data that would rarely occur if Hp were true provide evidence against Ho. 


A test is based on a test statistic that measures how far the sample outcome is from the 
value stated by Hp. 


The P-value of a test is the probability, computed supposing Hp to be true, that the 
test statistic will take a value at least as extreme as that actually observed. Small 
P-values indicate strong evidence against Hp. To calculate a P-value, we must know 
the sampling distribution of the test statistic when H, is true. 


If the P-value is as small as or smaller than a specified value a, the data are statistically 
significant at significance level a. 
Tests of significance for the null hypothesis Hp: w = fo concerning the unknown 
mean yp of a population are based on the one-sample z test statistic 

x= Ho 


oe 


The z test assumes an SRS of size n from a Normal population with known population 
standard deviation o. P-values can be obtained either with computations from the 
standard Normal distribution or by using technology (applet or software). 


(Optional material) P-values can be estimated from a simulation. This method can be 
applied to tests of hypotheses based on test statistics other than the sample mean. 


Link It 


In this chapter we discuss tests of significance, the second type of statistical inference. 
The mathematics of probability—in particular, the sampling distributions discussed in 
Chapter 15—provides the formal basis for a test of significance. The sampling distribu- 
tion allows us to assess “probabilistically” the strength of evidence against a null hypoth- 
esis, either through a level of significance or a P-value. The goal of hypothesis testing, 
which is used to assess the evidence provided by data about some claim concerning a 


Summary 
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population, is different from the goal of confidence interval estimation, which is used to 
estimate a population parameter. 

Although we apply the reasoning of tests of significance for the mean of a population 
that has a Normal distribution in a simple and artificial setting (we assume that we know 
the population standard deviation), we will use the same logic in future chapters to con- 
struct tests of significance for population parameters in more realistic settings. 


() LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 

resources should help prepare you to solve the exercises at the end of this chapter. 

* StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. You may find this helpful if you need 
additional help understanding any of the material in this chapter. 


* LearningCurve provides you with a series of questions about the chapter geared 
to your level of understanding. 


CHECK YOUR SKILLS 


Check Your Skills 411 


17.21 


17.22 


17.23 


17.24 


17.25 


You use software to carry out a test of significance. 
The program tells you that the P-value is P = 0.008. 
You conclude that the probability, computed assum- 
ing that H, is 

(a) true, of the test statistic taking a value as 
extreme as or more extreme than that actually 
observed is 0.008. 

(b) true, of the test statistic taking a value as 
extreme as or less extreme than that actually 
observed is 0.008. 

(c) false, of the test statistic taking a value as 
extreme as or more extreme than that actually 
observed is 0.008. 

You use software to carry out a test of significance. 
The program tells you that the P-value is P = 0.008. 
This result is 


17.26 


(a) not statistically significant at either a = 0.05 or 
a = 0.01. 

(b) statistically significant at a = 0.05 but not at 

a = 0.01. 

(c) statistically significant at both a = 0.05 and 

a = 0.01. 


The x statistic for a one-sided test is y = 2.41. This 
test is 


17.27 


(a) not statistically significant at either a = 0.05 or 
a= 0.01. 

(b) statistically significant at a = 0.05 but not at 

a =0.01. 

(c) statistically significant at both a = 0.05 and 
a= 0.01. 


The gas mileage for a particular model car is known 
to have a standard deviation of a = 1.0 mile per 
gallon in repeated tests in a controlled laboratory 
environment at a fixed speed. For a fixed speed, gas 
mileages in repeated tests are Normally distributed. 
Tests on three cars of this model at 65 miles per hour 
give gas mileages of 24.3, 24.9, and 24.8 miles per 
gallon. The z statistic for testing Ho: w = 25 miles 
per gallon based on these three measurements is 


(a) z = —0.333. 
(b) zy = —0.577. 
(c) z = 0.577. 


Experiments on learning in animals sometimes mea- 
sure how long it takes mice to find their way through 
a maze. The mean time is 18 seconds for one par- 
ticular maze. A researcher thinks that a loud noise 


17.28 


17.29 


will cause the mice 
to complete the 
maze faster. She 
measures how long 
each of 10 mice 
takes with a noise 
as stimulus. The 
sample mean is 
x = 16.5 seconds. The null hypothesis for the test of 
significance is 


yi 


Garry Gay/Getty Images 


(a) Ho: w = 18. 
(b) Ho: w = 16.5. 
(c) Ho: w < 18. 


The alternative hypothesis for the test in Exercise 
17.25 is 

(a) He we = 18. 

(b) Hy: w < 18. 

(c) Hy: w = 16.5. 

Researchers investigated the effectiveness of oral 
zinc, as compared to a placebo, in reducing the 
duration of the common cold when taken within 
24 hours of the onset of symptoms. The researchers 
found those taking oral zinc had a statistically sig- 
nificantly shorter duration (P < 0.05) than those 
taking the placebo.’ This means that 


(a) the probability that the null hypothesis is true is 
less than 0.05. 

(b) the value of the test statistic, the mean reduc- 
tion in duration of the cold, is large. 

(c) neither of the above is true. 


You are testing Hp: ww = O against Hy: ww # 0 based 
on an SRS of 20 observations from a Normal popula- 
tion. What values of the z statistic are statistically 
significant at the a = 0.005 level? 


(a) All values for which xy > 2.576 

(b) All values for which y > 2.807 

(c) All values for which | z | SS Mili 

You are testing Hy: w= 0 against Hy: uw > 0 
based on an SRS of 20 observations from a Normal 
population. What values of the z statistic are statisti- 
cally significant at the a = 0.005 level? 


(a) All values for which xy > 2.576 
(b) All values for which z > 2.807 
(c) All values for which | z | => 25l07) 
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| CHAPTER 17 EXERCISES 


In all exercises that call for P-values, give the actual value if you 
use software or the P-Value applet. Otherwise, use Table C to give 
values between which P must fall. 


17.30 


17.31 


17.32 


Student study times. Exercise 16.19 (page 387) de- 
scribes a class survey in which students claimed to 
study an average of x = 15.3 hours in a typical week. 
Regard these students as an SRS from the population 
of all undergraduate students at this university. Does 
the study give good evidence that students claim to 
study more than 15 hours per week on the average? 


(a) State null and alternative hypotheses in terms of 
the mean study time in hours for the population. 

(b) What is the value of the test statistic z? 

(c) What is the P-value of the test? Can you conclude 
that students do claim to study more than 15 hours 
per week on the average? 


I want more muscle. If young men thought that 
their own level of muscle was about what women 
prefer, the mean “muscle gap” in the study described 
in Exercise 16.20 (page 387) would be 0. We suspect 
(before seeing the data) that young men think women 
prefer more muscle than they themselves have. 


(a) State null and alternative hypotheses for testing 
this suspicion. 

(b) What is the value of the test statistic z? 

(c) You can tell just from the value of z that the 
evidence in favor of the alternative is very strong 
(that is, the P-value is very small). Explain why this 
is true. 


Hotel managers’ personalities. Successful hotel 
managers must have personality characteristics often 
thought of as feminine (such as “compassionate”) as 
well as those often thought of as masculine (such as 
“forceful”). The Bem Sex-Role Inventory (BSRI) is a 
personality test that gives separate ratings for female 
and male stereotypes, both on a scale of 1 to 7. A 
sample of 148 male general managers of three-star 
and four-star hotels had mean BSRI femininity score 
y = 5.29.6 The mean score for the general male 
population is w = 5.19. Do hotel managers, on the 
average, differ statistically significantly in femininity 
score from men in general? Assume that the stan- 
dard deviation of scores in the population of all male 
hotel managers is the same as the o = 0.78 for the 
adult male population. 


(a) State null and alternative hypotheses in terms 
of the mean femininity score ps for male hotel 
managers. 

(b) Find the z test statistic. 

(c) What is the P-value for your 7? What do you 
conclude about male hotel managers? 


17.33 


17.34 


17.35 


Is this what P means? A randomized comparative 
experiment examined the effect of a technique for 
improving one’s ability to focus one’s attention on 
performance of undergraduate students on the Ver- 
bal portion of the Graduate Record Exam (GRE). 
The researchers found a statistically significant 
improvement in performance on the Verbal GRE 
(P < 0.05).’ When asked to explain the meaning 
of “P < 0.05),” a student says, “This means there is 
only probability less than 0.05 that the null hypoth- 
esis is true.” Explain what P < 0.05 really means in 
a way that makes it clear that the student’s explana- 
tion is wrong. 


How to show that you are rich. Every society has 
its own marks of wealth and prestige. In ancient 
China, it appears that owning pigs was such a mark. 
Evidence comes from examining burial sites. The 
skulls of sacrificed pigs tend to appear along with 
expensive ornaments, which suggests that the pigs, 
like the ornaments, signal the wealth and prestige 
of the person buried. A study of burials from around 
3500 B.c. concluded that “there are striking differ- 
ences in grave goods between burials with pig skulls 
and burials without them. ... A test indicates that 
the two samples of total artifacts are statistically sig- 
nificantly different at the 0.01 level.”® Explain clearly 
why “statistically significantly different at the 0.01 
level” gives good reason to think that there really is 
a systematic difference between burials that contain 
pig skulls and those that lack them. 


Alleviating test anxiety. Research suggests that 
pressure to perform well can reduce performance 
on exams. Are there effective strategies to deal with 
pressure? In an experiment, researchers had students 
take a test on mathematical skills. The same students 
were asked to take a second test on the same skills, 
but now each student was paired with a partner and 
only if both improved their scores would they receive 
a monetary reward for participating in the experi- 
ment. They were also told that their performance 
would be videotaped and watched by teachers and 
students. To help them cope with the pressure, ten 
minutes before the second exam they were asked to 
write as candidly as possible about their thoughts 
and feelings regarding the exam. “Students who 
expressed their thoughts before the high-pressure 
test showed a statistically significant 5% math ac- 
curacy improvement from the pretest to posttest” 
(P< 0.03).? A colleague who knows no statistics 
says that an increase of 5% isn’t a lot—maybe it’s just 
an accident due to natural variation among the stu- 
dents. Explain in simple language how “P < 0.03” 
answers this objection. 


17.36 


17.37 
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17.39 


Healthy body, healthy mind? In the study described in 
Exercise 17.33, researchers also examined the effect of 
a nutrition program on performance of undergraduates 
on the Verbal GRE. The researchers found no evidence 
of improvement (P > 0.5). The P-value refers to a 
null hypothesis of “no change” in GRE scores measured 
before and after the nutrition program. Explain clearly 
why this value provides no evidence of change. 


5% versus 1%. Sketch the standard Normal curve 
for the x test statistic and mark off areas under the 
curve to show why a value of z that is statistically 
significant at the 1% level in a one-sided test is al- 
ways statistically significant at the 5% level. If z is 
statistically significant at the 5% level, what can you 
say about its significance at the 1% level? 


The wrong alternative. A graduate student is com- 
paring final-exam test scores of male and female 
students in an introductory physics class. She starts 
with no expectations as to which sex will score more 
highly. After seeing that men did better than women 
on the first quiz, she tests a one-sided alternative 
about the mean final-exam scores, 


Ho: Mm = Pe 
Hg bm > be 
She finds z = 1.9 with one-sided P-value P = 0.0287. 


(a) Explain why she should have used the two-sided 
alternative hypothesis. 
(b) What is the correct P-value for y = 1.9? 


The wrong P. The report of a study of seat belt 
use by drivers says, “Hispanic drivers were not sta- 
tistically significantly more likely than White/non- 
Hispanic drivers to overreport safety belt use (27.4 
vs. 21.1%, respectively; z = 1.33, P > 1.0).”!? How 
do you know that the P-value given is incorrect? 
What is the correct one-sided P-value for test sta- 
tistic 7 = 1.33? 


Exercises 17.40 to 17.43 ask you to answer questions from data. As- 
sume that the “simple conditions” hold in each case. The exercise state- 
ments give you the State step of the four-step process. In your work, 
follow the Plan, Solve, and Conclude steps, illustrated in Example 
16.3 (page 381) for a confidence interval and in Example 17.7 for a 
test of significance. 


17.40 


Pulling wood apart. How heavy a load (pounds) is 
needed to pull apart pieces of Douglas fir 4 inches 
long and 1.5 inches square? Here are data from stu- 
dents doing a laboratory exercise: £f),) woop 


33,190 31,860 32,590 26,520 33,280 
32,320 33,020 32,030 30,460 32,700 
23,040 30,930 32,720 33,650 32,340 
24,050 30,170 31,300 28,730 31,920 


17.41 


Exercises 
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We are willing to regard the wood pieces prepared 
for the lab session as an SRS of all similar pieces of 
Douglas fir. Engineers also commonly assume that 
characteristics of materials vary Normally. Suppose 
that the strength of pieces of wood like these fol- 
lows a Normal distribution with standard deviation 
3000 pounds. 


(a) Is there statistically significant evidence at 
the a = 0.10 level against the hypothesis that 
the mean is 32,500 pounds for the two-sided 
alternative? 

(b) Is there statistically significant evidence at 
the a = 0.10 level against the hypothesis that 
the mean is 31,500 pounds for the two-sided 
alternative? 


Bone loss by nursing mothers. As discussed in 
Exercise 16.26 (page 388), breastfeeding moth- 
ers secrete calcium into their milk. Some of the 
calcium may come from their bones, so mothers 
may lose bone mineral. Researchers measured the 
percent change in mineral content of the spines of 
47 mothers during three months of breastfeeding."! 
Here are the data: 8yj)) BoNeLs 


Al S25) =A) =o) =O) =5,3) Scho) Sil SOx) =4)3} 
Jf, —=Fasy —=3oil 10) =,5) gh =5,7 =5,f =—10) =22 
6.5 140) $10) si) = 57 740) SW sy =abab 33) 
Al) 2p) i 3h) = 5.8) 25) —Ohs} =, = oye) We 
O93 S25) Wy 5.5 O07, =i =. 
The researchers are willing to consider these 47 
women as an SRS from the population of all nurs- 
ing mothers. Suppose that the percent change in 
this population has a Normal distribution with stan- 
dard deviation @ = 2.5%. Do these data give good 
evidence that, on the average, nursing mothers lose 
bone mineral? 
17.42 This wine stinks. Sulfur compounds cause “off- 


odors” in wine, so winemakers want to know the odor 
threshold, the lowest concentration of a compound 
that the human nose can detect. The odor threshold 
for dimethyl sulfide (DMS) in trained wine tasters is 
about 25 micrograms per liter of wine (ug/L). The 
untrained noses of consumers may be less sensitive, 
however. Here are the DMS odor thresholds for 
10 untrained students: 


30 30 42 35 22 3h Sl BW I Zz 


Assume that the odor threshold for untrained noses 
is Normally distributed with o = 7 ug/L. Is there 
evidence that the mean threshold for untrained 
tasters is greater than 25 ug/L? 
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Eye grease. Athletes performing in bright sunlight 
often smear black eye grease under their eyes to 
reduce glare. Does eye grease work? In one study, 
16 student subjects took a test of sensitivity to con- 
trast after three hours facing into bright sun, both 
with and without eye grease. This is a matched pairs 
design. Here are the differences in sensitivity, with 
eye grease minus without eye grease:’” £y))) Evers 


(005) 
0.24 


= 0:18) 0145 O16) 0103 
=O O289 1010550129 


0.07 0.64 
0.05 0.02 


= 
0.43 


We want to know whether eye grease increases sen- 
sitivity on the average. 


(a) What are the null and alternative hypotheses? 
Say in words what mean pw your hypotheses 
concern. 

(b) Suppose that the subjects are an SRS of all 
young people with normal vision, that contrast dif- 
ferences follow a Normal distribution in this popula- 
tion, and that the standard deviation of differences 
is 0 = 0.22. Carry out a test of significance. 

Tests from confidence intervals. A confidence 
interval for the population mean yp tells us which 
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Exploring the Web 


values of w are plausible (those inside the interval) 
and which values are not plausible (those outside the 
interval) at the chosen level of confidence. You can 
use this idea to carry out a test of any null hypoth- 
esis Hp: 6 = Mo starting with a confidence interval: 
reject Hy if fo is outside the interval and fail to reject if 
[Ao is inside the interval. 

The alternative hypothesis is always two-sided, 
H,: w # fo, because the confidence interval ex- 
tends in both directions from x. A 95% confidence 
interval leads to a test at the 5% significance level 
because the interval is wrong 5% of the time. In gen- 
eral, confidence level C leads to a test at significance 
levela =1-—C. 


(a) In Example 17.7, a medical director found mean 
blood pressure x = 126.07 for an SRS of 72 execu- 
tives. The standard deviation of the blood pressures of 
all executives is 9 = 15. Give a 90% confidence inter- 
val for the mean blood pressure yx of all executives. 

(b) The hypothesized value frp = 128 falls inside 
this confidence interval. Carry out the z test for 

Ho: w = 128 against the two-sided alternative. 
Show that the test is not statistically significant at the 
10% level. 

(c) The hypothesized value wy = 129 falls outside 
this confidence interval. Carry out the x test for 

Ho: w = 129 against the two-sided alternative. Show 
that the test is statistically significant at the 10% level. 


Tests from confidence intervals. A 95% confidence 
interval for a population mean is 30.7 + 3.2. Use 
the method described in Exercise 17.44 to answer 
these questions. 


(a) With a two-sided alternative, can you reject the 
null hypothesis that 4 = 33 at the 5% (a = 0.05) 
significance level? Why? 

(b) With a two-sided alternative, can you reject the 
null hypothesis that ~ = 34 at the 5% significance 
level? Why? 


Significance in journals. Choose a major journal in your field of study. Use a 


web search engine to find its website—just search on the journal’s name. Find a 
paper that uses a phrase like “significant (P = 0.01)” and summarize the findings 


in the paper. 
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A statistics glossary. An editorial was published in the Journal of the Nation- 


al Cancer Institute, Vol. 101, No. 23 (December 2, 2009) that announced some 
online resources for journalists, including a statistics glossary. The glossary can be 
found at www. oxfordjournals.org/our_journals/jnc/resource/ 
statistics%20glossary.pd£. Read the definition of a P-value. Is this an 
accurate definition? Explain your answer. 
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nference in Practice 


o this point, we have met just two procedures for statistical inference. Both 

concern inference about the mean y of a population when the “simple condi- 

tions” (page 374) are true: the data are an SRS, the population has a Normal 
distribution, and we know the standard deviation o of the population. Under these 
conditions, a confidence interval for the mean wp is 


Pe, 


Vn 


To test Ho: fw = Mo, we use the one-sample z statistic: 


Re ae 


pelle 
a/Vn 


We call these z procedures because they both start with the one-sample z statistic 
and use the standard Normal distribution. 

In later chapters, we will modify these procedures for inference about a popula- 
tion mean to make them useful in practice. We will also introduce procedures for 
confidence intervals and tests in most of the settings we met in learning to explore 
data. There are libraries—both of books and of software—full of more elaborate 
statistical techniques. The reasoning of confidence intervals and tests is the same, 
no matter how elaborate the details of the procedure are. 


CHAPTER 


° 
O 


In this chapter 
we cover... 


18.1 Conditions for inference in 
practice 


18.2 Cautions about confidence 
intervals 


18.3 Cautions about significance 
tests 


18.4 Planning studies: sample size 
for confidence intervals 


18.5 Planning studies: the power 
of a statistical test* 


z procedures 
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AIS] DON’T TOUCH 
wih THE PLANTS 


We know that confounding can 
distort inference, however, we 

don’t always recognize how easy it 

is to confound data. Consider the 
innocent scientist who visits plants 
in the field once a week to measure 
their size. To measure the plants, she 
has to touch them. A study of six 
plant species found that one touch 

a week significantly increased leaf 
damage by insects in two species and 
significantly decreased damage in 
another species. 


Inference in Practice 


There is a saying among statisticians that “mathematical theorems are true; 
statistical methods are effective when used with judgment.” That the one-sample 
z statistic has the standard Normal distribution when the null hypothesis is true is 
a mathematical theorem. Effective use of statistical methods requires more than 
knowing such facts. It requires even more than understanding the underlying 
reasoning. This chapter begins the process of helping you develop the judgment 
needed to use statistics in practice. That process will continue in examples and 
exercises through the rest of this book. 


Any confidence interval or significance test can be trusted only under specific 

A conditions. It’s up to you to understand these conditions and judge whether 

they fit your problem. With that in mind, let’s look back at the “simple 
conditions” for the z procedures. 

The final “simple condition,” that we know the standard deviation o of the 
population, is rarely satisfied in practice. The z procedures are therefore of little 
practical use. Fortunately, it’s easy to remove the “known a” condition. Chapter 20 
shows how. The condition that the size of the population is large compared to the 
size of the sample is often easy to verify, and when it is not satisfied there are spe- 
cial, advanced methods for inference. The other “simple conditions” (SRS, Normal 
population) are harder to escape. In fact, they represent the kinds of conditions 
needed if we are to trust almost any statistical inference. As you plan inference, you 
should always ask, “Where did the data come from?” and you must often also ask, 
“What is the shape of the population distribution?” This is the point where know- 
ing mathematical facts gives way to the need for judgment. 


Where did the data come from? The most important requirement for any 
inference procedure is that the data come from a process to which the laws of prob- 
ability apply. Inference is most reliable when the data come from a random sample 
or a randomized comparative experiment. Random samples use chance to choose 
respondents. Randomized comparative experiments use chance to assign subjects to 
treatments. The deliberate use of chance ensures that the laws of probability apply 
to the outcomes, and this in turn ensures that statistical inference makes sense. 


Where the Data Come from Matters 


When you use statistical inference, you are acting as if your data are a random sample 
or come from a randomized comparative experiment. 


If your data don't come from a random sample or a randomized comparative 

A experiment, your conclusions may be challenged. To answer the challenge, you 

must usually rely on subject-matter knowledge, not on statistics. It is com- 

mon to apply statistical inference to data that are not produced by random selec- 

tion. When you see such a study, ask whether the data can be trusted as a basis for 
the conclusions of the study. 


A psychologist is interested in how our visual perception can be fooled by optical 
illusions. Her subjects are students in Psychology 101 at her university. Most psy- 
chologists would agree that it’s safe to treat the students as an SRS of all people 
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with normal vision. There is nothing special about being a student that changes visual 
perception. 

A sociologist at the same university uses students in Sociology 101 to examine 
attitudes toward poor people and antipoverty programs. Students as a group are 
younger than the adult population as a whole. Even among young people, students 
aS a group come from more prosperous and better-educated homes. Even among 
students, this university isn’t typical of all campuses. Even on this campus, students in 
a sociology course may have opinions that are quite different from those of engineer- 
ing students. The sociologist can’t reasonably act as if these students are a random 
sample from any interesting population. m 


Our first examples of inference, using the z procedures, act as if the data are 


an SRS from the population of interest. Let’s look back at the examples in 
Chapters 16 and 17. 


The NHANES survey that produced the BMI data for Example 16.1 used a complex 
multistage sample design, so it’s a bit oversimplified to treat the BMI data as coming 
from an SRS from the population of young women.' Although the overall effect of 
the NHANES sample is close to an SRS, professional statisticians would use more 
complex inference procedures to match the more complex design of the sample. 

The 20 patrons in the tipping study in Example 16.3 were chosen from those 
eating at a particular restaurant to receive one of several treatments being compared 
in a randomized comparative experiment. Recall that each treatment group in a 
completely randomized experiment is an SRS of the available subjects. Researchers 
sometimes act as if the available subjects are an SRS from some population if there is 
nothing special about where the subjects came from. In some cases, researchers col- 
lect demographic data on subjects to help justify the assumption that the subjects are 
a representative sample from some population. We are willing to regard the subjects 
as an SRS from the population of patrons of this particular restaurant, but perhaps 
this needs to be explored further. 

The cola taste test in Examples 17.2 and 17.3 uses scores from 10 tasters. All 
were examined to be sure that they have no medical condition that interferes with nor- 
mal taste and then carefully trained to score sweetness using a set of standard drinks. 
We are willing to take their scores as an SRS from the population of trained tasters. 

The medical director who examined executives’ blood pressures in Example 17.7 
actually chose an SRS from the medical records of all executives in this company. m 


These examples are typical. One is an actual SRS, two are situations in which 

common practice is to act as if the sample were an SRS, and in the remaining 

example procedures that assume an SRS are used for a quick analysis of 
A data from a more complex random sample. There is no simple rule for decid- 
ing when you can act as if a sample is an SRS. Pay attention to these cautions: 


Practical problems such as nonresponse in samples or dropouts from an experiment can 
hinder inference even from a well-designed study. The NHANES survey has about 
an 80% response rate. This is much higher than opinion polls and most other 
national surveys, so by realistic standards NHANES data are quite trustworthy. 
(NHANES uses advanced methods to try to correct for nonresponse, but these 
methods work a lot better when response is high to start with.) 
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AIG] REALLY WRONG 
ih NUMBERS 


By now you know that “statistics” 
that don’t come from properly 
designed studies are often dubious 
and sometimes just made up. It’s 

rare to find wrong numbers that 
anyone can see are wrong, but it does 
happen. A German physicist claimed 
that 2006 was the first year since 
1441 with more than one Friday the 
13th. Sorry: Friday the 13th occurred 
in February and August of 2004, 
which is a bit more recent than 1441. 


Inference in Practice 


» Different methods are needed for different designs. The z procedures aren’t correct 
for random sampling designs more complex than an SRS. Later chapters give 
methods for some other designs, but we won’t discuss inference for really 
complex designs like that used by NHANES. Always be sure that you (or your 
statistical consultant) know how to carry out the inference your design calls for. 


« There is no cure for fundamental flaws like voluntary response surveys or 
uncontrolled experiments. Look back at the bad examples in Chapters 8 and 9 
and steel yourself to just ignore data from such studies. 


What is the shape of the population distribution? Most statistical infer- 
ence procedures require some conditions on the shape of the population distribu- 
tion. Many of the most basic methods of inference are designed for Normal popula- 
tions. That’s the case for the x procedures and also for the more practical procedures 
for inference about means that we will meet in Chapters 20 and 21. Fortunately, this 
condition is less essential than where the data come from. 

This is true because the zy procedures and many other procedures designed for 
Normal distributions are based on Normality of the sample mean x, not Normality 
of individual observations. The central limit theorem tells us that x is more Normal 
than the individual observations and that x becomes more Normal as the size of 
the sample increases. In practice, the z procedures are reasonably accurate for any 
roughly symmetric distribution for samples of even moderate size. If the sample 
is large, x will be close to Normal even if individual measurements are strongly 
skewed, as Figures 15.4 (pages 356-7) and 15.5 (page 358) illustrate. Later chapters 
give practical guidelines for specific inference procedures. 

There is one important exception to the principle that the shape of the popula- 
tion is less critical than how the data were produced. Outliers can distort the results 

of inference. Any inference procedure based on sample statistics like the sample 
A mean X that are not resistant to outliers can be strongly influenced by a few 
extreme observations. 

We rarely know the shape of the population distribution. In practice, we rely on 
previous studies and on data analysis. Sometimes long experience suggests that our 
data are likely to come from a roughly Normal distribution, or not. For example, 
heights of people of the same sex and similar ages are close to Normal, but weights 
are not. Always explore your data before doing inference. When the data are chosen 
at random from a population, the shape of the data distribution mirrors the shape 
of the population distribution. Make a stemplot or histogram of your data and look 
to see whether the shape is roughly Normal. Remember that small samples have a 
lot of chance variation, so that Normality is hard to judge from just a few observa- 
tions. Always look for outliers and try to correct them or justify their removal before 
performing the z procedures or other inference based on statistics like x that are 
not resistant. 

When outliers are present or the data suggest that the population is strongly 
non-Normal, consider alternative methods that don’t require Normality and are 
not sensitive to outliers. Some of these methods appear in Chapter 28 (available 
online). 


() LaunchPad Online Resources 


° The StatBoards video, Effect of Outliers on a Confidence Interval and Test, 
provides examples of how outliers can affect inference procedures about a 
population mean. 


18.2 Cautions about Confidence Intervals 
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Apply Your Knowledge 


18.1 Rate the Lecture. A professor is interested in how the 500 students in his 
class will rate today’s lecture. He selects the first 20 students on his class list, 
reads the names at the beginning of the lecture, and asks them to go online to 
the course website after class and rate the lecture on a scale of 0 to 5. Which 
of the following is the most important reason why a confidence interval 
for the mean rating by all his students based on these data is of little use? 
Comment briefly on each reason to explain your answer. 


(a) The number of students selected is small, so the margin of error will be large. 
(b) Many of the students selected may not be in attendance and/or will 
not respond. 


(c) The students selected can’t be considered a random sample from the 
population of all students in the course. 


18.2 Running Red Lights. A survey of licensed drivers inquired about run- 
ning red lights. One question asked, “Of every ten motorists who run a red 
light, about how many do you think will be caught?” The mean result for 880 
respondents was x = 1.92 and the standard deviation was s = 1.83.’ For 
this large sample, s will be close to the population standard deviation o, so 
suppose we know that 0 = 1.83. 

(a) Give a 95% confidence interval for the mean opinion in the population 
of all licensed drivers. 

(b) The distribution of responses is skewed to the right rather than 
Normal. This will not strongly affect the zy confidence interval for this 
sample. Why not? 

(c) The 880 respondents are an SRS from completed calls among 45,956 calls 
to randomly chosen residential telephone numbers listed in telephone 
directories. Only 5029 of the calls were completed. This information 
gives two reasons to suspect that the sample may not represent all 
licensed drivers. What are these reasons? 

18.3 Sampling Shoppers. A marketing consultant observes 50 consecutive 
shoppers at a department store the Friday after Thanksgiving, recording how 
much each shopper spends in the store. Suggest some reasons why it may be 
risky to act as if 50 consecutive shoppers at this particular time are an SRS of 
all shoppers at this store. 


The most important caution about confidence intervals in general is a conse- 
quence of the use of a sampling distribution. A sampling distribution shows how a 
statistic such as X varies in repeated random sampling. This variation causes random 
sampling error because the statistic misses the true parameter by a random amount. 

No other source of variation or bias in the sample data influences the sam- 
A pling distribution. So the margin of error in a confidence interval ignores every- 
thing except the sample-to-sample variation due to choosing the sample randomly. 


The Margin of Error Doesn’t Cover All Errors 


The margin of error in a confidence interval covers only random sampling errors. 


Practical difficulties such as undercoverage and nonresponse are often more serious than 
random sampling error. The margin of error does not take such difficulties into account. 


© Ilene MacDonald/Alamy 
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Recall from Chapter 8 that national opinion polls often have response rates less 
than 50%, and that even small changes in the wording of questions can strongly 
influence results. In such cases, the announced margin of error is probably unre- 
alistically small. And of course there is no way to assign a meaningful margin of 
error to results from voluntary response or convenience samples, because there 
is no random selection. Look carefully at the details of a study before you trust a 


confidence interval. 


Apply Your Knowledge 


18.4 


18.5 


18.6 


18.7 


What’s Your Weight? A 2013 Gallup Poll asked a national random sample 
of 477 adult women to state their current weight. The mean weight in the 
sample was x = 157. We will treat these data as an SRS from a Normally 
distributed population with standard deviation 0 = 35. 


(a) Give a 95% confidence interval for the mean weight of adult women 
based on these data. 


(b) Do you trust the interval you computed in part (a) as a 95% confidence 
interval for the mean weight of all U.S. adult women? Why or why not? 


Good Weather, Good Tips? Example 16.3 (page 381) described an exper- 
iment exploring the size of the tip in a particular restaurant when a message 
indicating that the next day’s weather would be good was written on the bill. 
You work part-time as a server in a restaurant. You read a newspaper article 
about the study that reports that with 95% confidence the mean percentage 
tip from restaurant patrons will be between 21.33 and 23.09 when the server 
writes a message on the bill stating that the next day’s weather will be good. 
Can you conclude that if you begin writing a message on patrons’ bills that 
the next day’s weather will be good, approximately 95% of the days you work 
your mean percentage tip will be between 21.33 and 23.09? Why or why not? 


Sample Size and Margin of Error. Example 16.1 (page 375) described 
NHANES data on the body mass index (BMI) of 654 young women. The 
mean BMI in the sample was x = 26.8. We treated these data as an SRS from 
a Normally distributed population with standard deviation 0 = 7.5. 


(a) Suppose that we had an SRS of just 100 young women. What would 
be the margin of error for 95% confidence? 


(b) Find the margins of error for 95% confidence based on SRSs of 
400 young women and 1600 young women. 


(c) Compare the three margins of error. How does increasing the sample 
size change the margin of error of a confidence interval when the 
confidence level and population standard deviation remain the same? 


Do You Eat Fast Food? A July 2013 Gallup poll asked, “How often, if ever, 
do you eat at fast-food restaurants, including drive-thru, take-out, and sit- 
ting down in the restaurant?” Of those sampled, 47% indicated they do so at 
least weekly. Gallup announced the poll’s margin of error for 95% confidence 
as +3 percentage points. Which of the following sources of error are in- 
cluded in this margin of error? 


(a) Gallup dialed landline telephone numbers at random and so missed all 
people without landline phones, including people whose only phone is 
a cell phone. 


(b) Some people whose numbers were chosen never answered the phone 
in several calls or answered but refused to participate in the poll. 


(c) There is chance variation in the random selection of telephone numbers. 


18.3 Cautions about Significance Tests 


Significance tests are widely used in most areas of statistical work. New phar- 
maceutical products require significant evidence of effectiveness and safety. Courts 
inquire about statistical significance in hearing class action discrimination cases. 
Marketers want to know whether a new package design will significantly increase 
sales. Medical researchers want to know whether a new therapy performs signifi- 
cantly better. In all these uses, statistical significance is valued because it points to 
an effect that is unlikely to occur simply by chance. Here are some points to keep 
in mind when you use or interpret significance tests. 


How small a P is convincing? The purpose of a test of significance is to 
describe the degree of evidence provided by the sample against the null hypothesis. 
The P-value does this. But how small a P-value is convincing evidence against the 
null hypothesis? This depends mainly on two circumstances: 


= How plausible is Ho? If Hp represents an assumption that the people you must 
convince have believed for years, strong evidence (small P) will be needed to 
persuade them. 


« What are the consequences of rejecting H)? If rejecting Ho in favor of H, means 
making an expensive changeover from one type of product packaging to 
another, you need strong evidence that the new packaging will boost sales. 


These criteria are a bit subjective. Different people will often insist on different 
levels of significance. Giving the P-value allows each of us to decide individually if 
the evidence is sufficiently strong. 

Users of statistics have often emphasized standard levels of significance such 
as 10%, 5%, and 1%. For example, courts have tended to accept 5% as a standard 
in discrimination cases.’ This emphasis reflects the time when tables of critical 
values rather than software dominated statistical practice. The 5% level (a = 0.05) 
is particularly common. There is no sharp border between “significant” and “not significant,” 

only increasingly strong evidence as the P-value decreases. There is no practical 
BD sicvcin between the P-values 0.049 and 0.051. It makes no sense to treat 
P =< 0.05 as a universal rule for what is significant. 


Significance depends on the alternative hypothesis You may have 
noticed that the P-value for a one-sided test is one-half the P-value for the two- 
sided test of the same null hypothesis based on the same data. The two-sided 
P-value combines two equal areas, one in each tail of a Normal curve. The one- 
sided P-value is just one of these areas, in the direction specified by the alterna- 
tive hypothesis. It makes sense that the evidence against Hp is stronger when the 
alternative is one-sided, because it is based on the data plus information about the 
direction of possible deviations from Hp. If you lack this added information, always 
use a two-sided alternative hypothesis. 


Significance depends on sample size A sample survey shows that sig- 
nificantly fewer students are heavy drinkers at colleges that ban alcohol on campus. 
“Significantly fewer” is not enough information to decide whether there is an 

A important difference in drinking behavior at schools that ban alcohol. How 
important an effect is depends on the size of the effect as well as on its statisti- 

cal significance. If the number of heavy drinkers is only 1% less at colleges that ban 
alcohol than at other colleges, this is probably not an important effect even if it is 
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if SHOULD TESTS 
| BE BANNED? 


Significance tests don’t tell us how 
large or how important an effect 
is. Research in psychology has 
emphasized tests, so much so that 
some think their weaknesses should 
ban them from use. The American 
Psychological Association asked a 
group of experts. They said: “Use 
anything that sheds light on your 
study. Use more data analysis and 
confidence intervals.” But: “The 
task force does not support any 
action that could be interpreted as 
banning the use of null hypothesis 
significance testing or P-values 

in psychological research and 
publication.” 


Inference in Practice 


statistically significant. (Ask yourself if you would describe 1% as “significantly” fewer 
when talking with a friend). In fact, the sample survey found that 38% of students at 
colleges that ban alcohol are “heavy episodic drinkers” compared with 48% at other 
colleges.* That difference is large enough to be important. (Of course, this observa- 
tional study doesn’t prove that an alcohol ban directly reduces drinking; it may be 
that colleges that ban alcohol attract more students who don’t want to drink heavily.) 

Such examples remind us to always look at the size of an effect (like 38% versus 
48%) as well as its significance. They also raise a question: can a tiny effect really be 
highly significant? Yes. The behavior of the z test statistic is typical. The statistic is 


_ X= Mo 


ea/ Va 


The numerator measures how far the sample mean deviates from the hypoth- 
esized mean fo. Larger values of the numerator give stronger evidence against 
Ho: &@ = fo. The denominator is the standard deviation of x. It measures how 
much random variation we expect. There is less variation when the number of 
observations n is large. So z gets larger (more significant) when the estimated effect 
X — Mo gets larger or when the number of observations n gets larger. Significance 
depends both on the size of the effect we observe and on the size of the sample. 
Understanding this fact is essential to understanding significance tests. 


Sample Size Affects Statistical Significance 


Because large random samples have small chance variation, very small population 
effects can be highly significant if the sample is large. 


Because small random samples have a lot of chance variation, even large population 
effects can fail to be significant if the sample is small. 


Statistical significance does not tell us whether an effect is large enough to be 
important. That is, statistical significance is not the same thing as practical 
significance. 


Keep in mind that “statistical significance” means “the sample showed an 
effect larger than would often occur just by chance.” The extent of chance varia- 
tion changes with the size of the sample, so the size of the sample does matter. 
Exercise 18.9 demonstrates in detail how increasing the sample size drives down 
the P-value. Here is another example. 


We are testing the hypothesis of no correlation between two variables. With 1000 
observations, an observed correlation of only r = 0.08 is significant evidence at 
the 1% level that the correlation in the population is not zero but positive. The small 
P-value does not mean that there is a strong association, only that there is strong 
evidence of some association. The true population correlation is probably quite close 
to the observed sample value, r = 0.08. We might well conclude that for practical 
purposes we can ignore the association between these variables, even though we are 
confident (at the 1% level) that the correlation is positive. 

On the other hand, if we have only 10 observations, a correlation of r = 0.5 is 
not significantly greater than zero even at the 5% level. Small samples vary so much 
that a large r is needed if we are to be confident that we aren't just seeing chance 
variation at work. So a small sample will often fall short of significance even if the 
true population correlation is quite large. m 


18.3 Cautions about Significance Tests 


Beware of multiple analyses Statistical significance ought to mean that 
you have found an effect that you were looking for. The reasoning behind statistical 
significance works well if you decide what effect you are seeking, design a study to 
search for it, and use a test of significance to weigh the evidence you get. In other 
settings, significance may have little meaning. 


Might the radiation from cell phones be harmful to users? Many studies have found 
little or no connection between using cell phones and various illnesses. Here is part 
of a news account of one study: 


A hospital study that compared brain cancer patients and a similar group 
without brain cancer found no statistically significant association between 
cell phone use and a group of brain cancers known as gliomas. But when 
20 types of glioma were considered separately an association was found 
between phone use and one rare form. Puzzlingly, however, this risk appeared 
to decrease rather than increase with greater mobile phone use.° 


Think for a moment. Suppose that the 20 null hypotheses (no association) for these 
20 significance tests are all true. Then each test has a 5% chance of being significant 
at the 5% level. That’s what a = 0.05 means: results this extreme occur 5% of the 
time just by chance when the null hypothesis is true. Because 5% is 1/20, we expect 
about 1 of 20 tests to give a significant result just by chance. That’s what the study 
observed. m 


evidence that you have found something. Running 20 tests and reaching that level 

only once is not. The caution about multiple analyses applies to confidence 
intervals as well. A single 95% confidence interval has probability 0.95 of capturing 
the true parameter each time you use it. The probability that all of 20 confidence 
intervals will capture their parameters is much less than 95%. If you think that mul- 
tiple tests or intervals may have discovered an important effect, you need to gather 
new data to make inferences about that specific effect. 


A Running one test and reaching the 5% level of significance is reasonably good 
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Apply Your Knowledge 


18.8 Is It Significant? In the absence of special preparation, SAT Mathematics 
(SATM) scores in 2013 varied Normally with mean w = 514 and o = 118. 
Fifty students go through a rigorous training program designed to raise their 
SATM scores by improving their mathematics skills. Either by hand or by 
using the P-Value of a Test of Significance applet, carry out a test of 
Hy pp = 514 
Hy pw > 514 


(with o = 118) in each of the following situations: 


(a) The students’ average score is x = 541. Is this result significant at the 
5% level? 


(b) The average score is x = 542. Is this result significant at the 5% level? 


The difference between the two outcomes in parts (a) and (b) is of no practical 
importance. Beware attempts to treat a = 0.05 as sacred. 


e 
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Inference in Practice 


18.9 Detecting Acid Rain. Emissions of sulfur dioxide by industry set off chemi- 
cal changes in the atmosphere that result in “acid rain.” The acidity of lig- 
uids is measured by pH ona scale of 0 to 14. Distilled water has pH 7.0, and 
lower pH values indicate acidity. Normal rain is somewhat acidic, so acid rain 
is sometimes defined as rainfall with a pH below 5.0. Suppose that pH mea- 
surements of rainfall on different days in a Canadian forest follow a Normal 
distribution with standard deviation a = 0.6. A sample of n days finds that 
the mean pH is x = 4.8. Is this good evidence that the mean pH yp for all 
rainy days is less than 5.0? The answer depends on the size of the sample. 

Either by hand or using the P-Value of a Test of Significance applet, carry 
out four tests of 


leke wr 5.0 
Isle, <= S00 


Use o = 0.6 and x = 4.8 in all four tests. But use four different sample 

sizes:n = 9,n = 16, n = 36, andn = 64. 

(a) What are the P-values for the four tests? The P-value of the same result 
x = 4.8 gets smaller (more significant) as the sample size increases. 

(b) For each test, sketch the Normal curve for the sampling distribution 
of x when H, is true. This curve has mean 5.0 and standard deviation 
0.6/ Vn. Mark the observed ¥ = 4.8 on each curve. (If you use the 
applet, you can just copy the curves displayed by the applet.) The 
same result x = 4.8 gets more extreme on the sampling distribution as the 
sample size increases. 


18.10 Confidence Intervals Help. Give a 95% confidence interval for the 
mean pH yp for each sample size in Exercise 18.9. The intervals, unlike the 
P-values, give a clear picture of what mean pH values are plausible for each 
sample. 

18.11 Searching for ESP. A researcher looking for evidence of extrasensory 
perception (ESP) tests 1000 subjects. Nine of these subjects do significantly 
better (P < 0.01) than random guessing. 


(a) Nine seems like a lot of people, but you can’t conclude that these 
nine people have ESP Why not? 


(b) What should the researcher now do to test whether any of these nine 
subjects have ESP? 


A wise user of statistics never plans a sample or an experiment without at the 
same time planning the inference. The number of observations is a critical part 
of planning a study. Larger samples give smaller margins of error in confidence 
intervals and make significance tests better able to detect effects in the population. 
But taking observations costs both time and money. How many observations are 
enough? We will look at this question first for confidence intervals and then for 
tests. Planning a confidence interval is much simpler than planning a test. It is also 
more useful, because estimation is generally more informative than testing. The 
section on planning tests is therefore optional. 

You can arrange to have both high confidence and a small margin of error by 
taking enough observations. The margin of error of the z confidence interval for the 
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mean of a Normally distributed population is m = z*a/Vn. To obtain a desired 
margin of error m, put in the value of z* for your desired confidence level, and solve 
for the sample size n. Here is the result. 


Sample Size for Desired Margin of Error 


The x confidence interval for the mean of a Normal population will have a specified 
margin of error m when the sample size is 


2*o B 
= | —— 
m 


Notice that it is the size of the sample that determines the margin of error. The 
size of the population does not influence the sample size we need. (This is true 
as long as the population is much larger than the sample.) 


In Example 16.3 (page 381), psychologists recorded the size of the tip of 20 
patrons in a restaurant when a message indicating that the next day’s weather would 
be good was written on their bill. We know that the population standard deviation is 
o = 2. We want to estimate the mean percentage tip w for patrons of this restaurant 
who receive this message on their bill within +0.5 with 90% confidence. How many 
patrons must we observe? 

The desired margin of error is m = 0.5. For 90% confidence, Table C gives 
z* = 1.645. Therefore, 


z*o \" 1.645 x 2\? 
n= = = 43.3 
m 0.5 
Because 43 patrons will give a slightly larger margin of error than desired, and 


44 patrons a slightly smaller margin of error, we must observe 44 patrons. Always 
round up to the next higher whole number when finding n. = 


(4) LaunchPad Online Resources 


° The StatBoards video, Calculating Sample Size for Estimating a Mean, 
provides examples of determining the sample size to achieve a desired 
confidence level. 


Apply Your Knowledge 


18.12 Body Mass Index of Young Women. Example 16.1 (page 375) assumed 
that the body mass index (BMI) of all American young women follows a 
Normal distribution with standard deviation o = 7.5. How large a sample 
would be needed to estimate the mean BMI y in this population to within 
+1 with 95% confidence? 


18.13 Number Skills of Eighth Graders. Suppose that scores on the math- 
ematics part of the National Assessment of Educational Progress (NAEP) 
test for eighth-grade students follow a Normal distribution with standard 
deviation 0 = 125. You want to estimate the mean score within +10 with 
90% confidence. How large an SRS of scores must you choose? 
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How large a sample should we take when we plan to carry out a test of signifi- 
cance? We know that if our sample is too small, even large effects in the population 
will often fail to give statistically significant results. Here are the questions we must 
answer to decide how many observations we need: 


Significance level. How much protection do we want against getting a significant 
result from our sample when there really is no effect in the population? 


Effect size. How large an effect in the population is important in practice? 


» Power. How confident do we want to be that our study will detect an effect 
of the size we think is important? 


The three boldface terms are statistical shorthand for three pieces of information. 
Power is a new idea. 


Let’s illustrate typical answers to these questions in the example of testing a new 
cola for loss of sweetness in storage (Example 17.2, page 392). Ten trained tasters 
rated the sweetness on a 10-point scale before and after storage, so that we have 
each taster’s judgment of loss of sweetness. From experience, we know that sweet- 
ness loss scores vary from taster to taster according to a Normal distribution with 
standard deviation about o = 1. To see if the taste test gives reason to think that the 
cola does lose sweetness, we will test 


Ho: = O 
H,: w > O 
Are 10 tasters enough, or should we use more? 


Significance level. Requiring significance at the 5% level is enough protection 
against declaring there is a loss in sweetness when in fact there is no change if 
we could look at the entire population. This means that when there is no change in 
sweetness in the population, one out of 20 samples of tasters will wrongly find a 
significant loss. 


Effect size. A mean sweetness loss of 0.8 point on the 10-point scale will be 
noticed by consumers and so is important in practice. 


Power. We want to be 90% confident that our test will detect a mean loss of 0.8 
point in the population of all tasters. We agreed to use significance at the 5% level 
as our standard for detecting an effect. So we want probability at least 0.9 that a 
test at the a = 0.05 level will reject the null hypothesis Ho: ~ = O when the true 
population mean is w = 0.8.8 


The probability that the test successfully detects a sweetness loss of the specified 
size is the power of the test. You can think of tests with high power as being highly 
sensitive to deviations from the null hypothesis. In Example 18.6, we decided that 
we want power 90% when the truth about the population is that uw = 0.8. 


*Power calculations are important in planning studies, but this more advanced material is not needed 
to understand the rest of the book. 
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The power of a test against a specific alternative is the probability that the test will 
reject Hy at a chosen significance level a when the specified alternative value of the 
parameter is true. 


For most statistical tests, calculating power is a job for comprehensive statisti- 
cal software. The z test is easier, but we will nonetheless skip the details. The two 
following examples illustrate two approaches: an applet that shows the meaning of 
power, and statistical software. 


Finding the power of the z test is less challenging than most other power calculations 
because it requires only a Normal distribution probability calculation. The Statistical 
Power applet does this and illustrates the calculation with Normal curves. Enter 
the information from Example 18.6 into the applet: hypotheses, significance level 
a@ = 0.05, alternative value = 0.8, standard deviation o = 1, and sample size 
n = 10. Click “Update.” The applet output appears in Figure 18.1. 


Ho: = 0 | 


Har @u>Od 
Ou<0 
Ouz0 

o=|1 | 


n=(10 


Alpha (a): 0.05 
Cs) -1265 -0.849 -0.432 -0.016 0400 0.816 1232 1649 2.065 
Power = 0.812 


alt. u= 0.8 | 


UPDATE 


RESET 


1.265 -0.849 -0.432 -0.016 0.400 0.816 1.232 1.649 2.065 


The power of the test against the specific alternative w = 0.8 is 0.812. That 
is, the test will reject H. about 81% of the time when this alternative is true. So 
10 observations are too few to give power 90%. m 


The two Normal curves in Figure 18.1 show the sampling distribution of x under 
the null hypothesis 4 = 0 (top) and also under the specific alternative uw = 0.8 
(bottom). The curves have the same shape because o does not change. The top 
curve is centered at w = O and the bottom curve at us = 0.8. The shaded region at 
the right of the top curve has area 0.05. It marks off values of x that are statistically 


FIGURE 18.1 

Output from the Statistical Power 
applet for Example 18.7, along with 
the information entered into the applet. 
The top curve shows the behavior of 
X when the null hypothesis is true 

(w = 0). The bottom curve shows 
the distribution of x when p = 0.8. 
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AI FISH, FISHERMEN, 
ik AND POWER 


Are the stocks of cod in the ocean 
off eastern Canada declining? 
Studies over many years failed to 
find significant evidence of a decline. 
These studies had low power—that 
is, they might fail to find a decline 
even if one were present. When 

it became clear that the cod were 
vanishing, quotas on fishing ravaged 
the economy in parts of Canada. 

If the earlier studies had had high 
power, they would likely have 
detected the decline. Quick action 
might have reduced the economic 
and environmental costs. 


Inference in Practice 


significant at the a = 0.05 level. The lower curve shows the probability of these 
same values when wu = 0.8. This area is the power, 0.812. 

The applet will find the power for any given sample size. It’s more helpful in 
practice to turn the process around and learn what sample size we need to achieve 
a given power. Statistical software will do this but usually doesn’t show the helpful 
Normal curves that are part of the applet’s output. 


Some software packages (for example, SAS, JMP, Minitab, and R) will calculate 
power. We asked Minitab to find the number of observations needed for the one- 
sided z test to have power 0.9 against several specific alternatives at the 5% sig- 
nificance level when the population standard deviation is o = 1. Here is the table 
that results: 


Sample Target 


Difference Size Power Actual Power 
O.1 857 0:39 0.900184 
0.2 215 03-9 0.901079 
0.3 96 0.9 0.902259 
0.4 54 0.9 0.902259 
0.5 35 0.9 0.905440 
0.6 24 0.9 0.902259 
0.7 18 0.9 0.907414 
0.8 14 0.9 0.911247 
0.9 alba 0:59 0.909895 
1.0 9 0.9 0.912315 


In this output, “Difference” is the difference between the null hypothesis value w = O 
and the alternative we want to detect. This is the effect size. The “Sample Size” 
column shows the smallest number of observations needed for power 0.9 against 
each effect size. 

We see again that our earlier sample of 10 tasters is not large enough to be 90% 
confident of detecting (at the 5% significance level) an effect of size 0.8. If we want 
power 90% against effect size 0.8, we need at least 14 tasters. The actual power 
with 14 tasters is 0.911247. 

Statistical software, unlike the applet, will do power calculations for most of the 
tests in this book. = 


The table in Example 18.8 makes it clear that smaller effects require larger 
samples to reach power 90%. Here is an overview of influences on “How large a 
sample do I need?” 


If you insist on a smaller significance level (such as 1% rather than 5%), 

you will need a larger sample. A smaller significance level requires stronger 
evidence to reject the null hypothesis. 

If you insist on higher power (such as 99% rather than 90%), you will need a 
larger sample. Higher power gives a better chance of detecting an effect when 
it is really there. 

At any significance level and desired power, a two-sided alternative requires a 
larger sample than a one-sided alternative. 

At any significance level and desired power, detecting a small effect requires a 
larger sample than detecting a large effect. 


18.5 Planning Studies: The Power of a Statistical Test 


Planning a serious statistical study always requires an answer to the question, 
“How large a sample do I need?” If you intend to test the hypothesis Ho: w = po 
about the mean p of a population, you need at least a rough idea of the size of the 
population standard deviation o and of how big a deviation w — po of the popula- 
tion mean from its hypothesized value you want to be able to detect. More elabo- 
rate settings, such as comparing the mean effects of several treatments, require 
more elaborate advance information. You can leave the details to experts, but you 
should understand the idea of power and the factors that influence how large a 
sample you need. 

To calculate the power of a test, we act as if we are interested in a fixed level 
of significance such as a = 0.05. That’s essential to do a power calculation, but 
remember that in practice we think in terms of P-values rather than a fixed level a. 
To effectively plan a statistical test we must find the power for several significance 
levels and for a range of sample sizes and effect sizes to get a full picture of how the 
test will behave. 


Type | and Type Il errors in significance tests We can assess the perfor- 
mance of a test by giving two probabilities: the significance level a and the power 
for an alternative that we want to be able to detect. The significance level of a test 
is the probability of reaching the wrong conclusion when the null hypothesis is true. 
The power for a specific alternative is the probability of reaching the right conclu- 
sion when that alternative is true. We can just as well describe the test by giving 
the probabilities of being wrong under both conditions. 


Type I and Type II Errors 


If we reject Hy when in fact H) is true, this is a Type | error. 

If we fail to reject Hy when in fact H, is true, this is a Type Il error. 

The significance level @ of any fixed level test is the probability of a Type I error. 
The power of a test against any alternative is the probability of correctly rejecting the 


null hypothesis for that alternative. It can be calculated as 1 minus the probability of 
a Type II error for that alternative. 


The possibilities are summed up in Figure 18.2. If Ho is true, our conclusion 
is correct if we fail to reject Hy and is a Type I error if we reject Ho. If H, is true, 
our conclusion is either correct or a Type II error. Only one error is possible at 
one time. 


Truth about 
the population 


Ho true H, true 


é : Correct 
Conclusion Reject Ho | Type | error conclusion 
based on 
sample 

Fail to reject Ho Correct Type Il error 
conclusion 


FIGURE 18.2 
The two types of error in testing 
hypotheses. 
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Because the probabilities of the two types of error are just a rewording of signifi- 
cance level and power, we can see from Figure 18.1 what the error probabilities are 
for the test in Example 18.6. 


P(Type | error) = P(reject Hp when in fact w = O) 


significance level a = 0.05 
P(Type Il error) = P(fail to reject Ho when in fact w = 0.8) 
= 1 — power = 1 — 0.812 = 0.188 


The two Normal curves in Figure 18.1 are used to find the probabilities of a Type | 
error (top curve, w = O) and of a Type Il error (bottom curve, w = 0.8). m™ 


(4) LaunchPad Online Resources 


* The StatBoards video, Power, Type |, and Type II Errors, provides examples 
of calculating power, type |, and type Il errors. 


Apply Your Knowledge 


18.14 What Is Power? The Trial Urban District Assessment (TUDA) mea- 
sures educational progress within participating large urban districts. 
TUDA gives a reading test scored from 0 to 500. A score of 243 is a 
“basic” reading level for eighth-graders.° Suppose scores on the TUDA 
reading test for eighth-graders in your district follow a Normal distribu- 
tion with standard deviation @ = 55. In 2013 the mean score for eighth- 
graders in your district was 255. You plan to give the reading test to a ran- 
dom sample of 25 eighth-graders in your district this year to test whether 
the mean score p for all eighth-graders in your district is still above the 
basic level. You will therefore test 


He p = 243 
Hy pw > 243 
If the true mean score is again 255, on average students are performing 


above the basic level. You learn that the power of your test at the 5% signifi- 

cance level against the alternative w = 255 is 0.29. 

(a) Explain in simple language what “power = 0.29” means. 

(b) Explain why the test you plan will not adequately protect you against 
deciding that average reading scores in your district are not above 
basic level. 

18.15 Thinking about Power. Answer these questions in the setting of 

Exercise 18.14 about measuring eighth-grade reading scores on TUDA. 

(a) You could get higher power against the same alternative with the 
same a by changing the number of measurements you make. Should 
you make more measurements or fewer to increase power? 

(b) Ifyou decide to use a = 0.10 in place of a = 0.05, with no other 
changes in the test, will the power increase or decrease? 

(c) Ifyou shift your interest to the alternative 4 = 260 with no other 
changes, will the power increase or decrease? 


18.16 How Power Behaves. In the setting of Exercise 18.14, use the Statistical 
Power applet to find the power in each of the following circumstances. 


(a) Standard deviation o = 55, significance level a = 0.05, alternative 
pw = 255, and sample sizes n = 25, n = 50, andn = 100. How does 
increasing the sample size with no other changes affect the power? 


(b) Standard deviation a = 55, significance level a = 0.05, sample size 
n = 25, and alternatives w = 255, w = 260, and w = 265. How 
do alternatives more distant from the hypothesis (larger effect sizes) 
affect the power? 


(c) Standard deviation o@ = 55, sample size n = 25, alternative uw = 255, 
and significance levels a = 0.05, a = 0.10, and a = 0.20. How does 
increasing the desired significance level affect the power? 


18.17 How Power Behaves. Use the Statistical Power applet to find the power 
of the test in Exercise 18.14 in each of these circumstances: significance 
level a = 0.05, alternative ww = 255, sample size n = 25, and o = 55, 
oa = 40, and o = 30. How does decreasing the variability of the popula- 
tion of measurements affect the power? 


18.18 Two Types of Error. Your company markets a computerized medical di- 
agnostic program used to evaluate thousands of people. The program scans 
the results of routine medical tests (pulse rate, blood tests, etc.) and refers 
the case to a doctor if there is evidence of a medical problem. The program 
makes a decision about each person. 

(a) What are the two hypotheses and the two types of error that the 
program can make? Describe the two types of error in terms of “false- 
positive” and “false-negative” test results. 

(b) The program can be adjusted to decrease one error probability, at 
the cost of an increase in the other error probability. Which error 
probability would you choose to make smaller, and why? (This is a 
matter of judgment. There is no single correct answer.) 


CHAPTER 18 SUMMARY 


Chapter Specifics 


m A specific confidence interval or test is correct only under specific conditions. 
The most important conditions concern the method used to produce the 
data. Other factors such as the shape of the population distribution may also be 


important. 


m Whenever you use statistical inference, you are acting as if your data are a random 


sample or come from a randomized comparative experiment. 


m Always do data analysis before inference to detect outliers or other problems that 


would make inference untrustworthy. 


m The margin of error in a confidence interval accounts for only the chance variation due 
to random sampling. In practice, errors due to nonresponse or undercoverage are often 


more serious. 


m There is no universal rule for how small a P-value in a test of significance is convincing 
evidence against the null hypothesis. Beware of placing too much weight on traditional 


significance levels such as a = 0.05. 
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m Very small effects can be highly significant (small P) when a test is based on a large 
sample. A statistically significant effect need not be practically important. Plot the data 
to display the effect you are seeking, and use confidence intervals to estimate the actual 
values of parameters. 


m On the other hand, lack of significance does not imply that Hp is true. Even a large 
effect can fail to be significant when a test is based on a small sample. 


m Many tests run at once will probably produce some significant results by chance alone, 
even if all the null hypotheses are true. 


m When you plan a statistical study, plan the inference as well. In particular, ask what 
sample size you need for successful inference. 


m The z confidence interval for a Normal mean has specified margin of error m when the 
sample size is 


m Here * is the critical value for the desired level of confidence. Always round n up 
when you use this formula. 


= (Optional material) The power of a significance test measures its ability to detect 
the truth of an alternative hypothesis. The power against a specific alternative is 
the probability that the test will reject Hy at a particular level a when that alterna- 
tive is true. 


m= (Optional material) Increasing the size of the sample increases the power of a signifi- 
cance test. You can use statistical software to find the sample size needed to achieve 
a desired power. 


Link It 


In Chapters 16 and 17, we introduced the basic reasoning behind statistical estimation 
and tests of significance. We applied this reasoning to the problem of making inferences 
about a mean of a Normally distributed population in a simple and artificial setting (the 
population standard deviation is known). In this chapter, we begin to move away from 
our simple and artificial conditions toward the reality of statistical practice. Later chap- 
ters deal with inference in fully realistic settings. Here we discuss more carefully condi- 
tions under which our procedures for inference do and do not hold. Where the data 
come from is crucial. Even if the population distribution is not Normal, our procedures 
are approximately correct if we have a moderate sample size and the shape of the popu- 
lation distribution is roughly symmetric. If the sample size is large, our procedures are 
approximately correct even if the population distribution is strongly skewed. However, 
any procedure based on sample statistics like the sample mean that are not resistant to 
outliers can be influenced by a few extreme observations. This is a caution that we first 
encountered in Chapter 4. 

In this chapter we also provide some cautions about confidence intervals and tests 
of significance. Some of these echo statements made in Chapters 8 and 9. Some are 
based on the behavior of confidence intervals and significance tests. These cautions 
will help you evaluate studies that report confidence intervals or the results of a test 
of significance. 

We conclude this chapter with material about planning a study that will use a con- 
fidence interval or a significance test. The fundamental question is, “What sample size 
do I need for successful inference?” 


Summary 


() LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this 
chapter. 

* StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. You may find this helpful if you need 
additional help understanding any of the material in this chapter. 

* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 
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| CHECK YOUR SKILLS 


18.19 


18.20 


18.21 


18.22 


18.23 


The most important condition for sound conclusions 
from statistical inference is usually that 


(a) the data can be thought of as a random sample 
from the population of interest. 

(b) the population distribution is exactly Normal. 
(c) the data contain no outliers. 


The coach of a college men’s soccer team records 
the resting heart rates of the 27 team members. You 
should not trust a confidence interval for the mean 
resting heart rate of all male students at this college 
based on these data because 


(a) with only 27 observations, the margin of error will 
be large. 

(b) heart rates may not have a Normal distribution. 
(c) the members of the soccer team can’t be considered 
a random sample of all students. 


You turn your web browser to the online Harris 
Interactive Poll. Based on 2234 responses, the poll 
reports that 45% of U.S. adults believe global cli- 
mate change exists and humans are the main cause, 
30% believe global climate change exists but that its 
causes are mainly not related to humans, 13% do not 
believe global climate change exists, and 12% are 
undecided.’ You should refuse to calculate a 95% 
confidence interval for the proportion of all U.S. 
adults who do not believe global climate change ex- 
ists based on this sample because 


(a) this percent is too small. 

(b) inference from a voluntary response sample 
can’t be trusted. 

(c) the sample is too large. 


Many sample surveys use well-designed random 
samples but half or more of the original sample 
can’t be contacted or refuse to take part. Any errors 
due to this nonresponse 


(a) have no effect on the accuracy of confidence 
intervals. 

(b) are included in the announced margin of 

error. 

(c) are in addition to the random variation accounted 
for by the announced margin of error. 


A writer in a medical journal says: “An uncontrolled 
experiment in 37 women found a significantly im- 
proved mean clinical symptom score after treatment. 
Methodologic flaws make it difficult to interpret the 
results of this study.” The writer is skeptical about 
the significant improvement because 


(a) there is no control group, so the improvement 
might be due to the placebo effect or to the 
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18.25 


18.26 


18.27 


fact that many medical conditions improve 

over time. 

(b) the P-value given was P = 0.048, which is too 
large to be convincing. 

(c) the response variable might not have an exactly 
Normal distribution in the population. 


Vigorous exercise is associated with several years of 
longer life (on the average). Whether mild activities 
like slow walking are associated with longer life is not 
clear. Suppose that the added life expectancy associ- 
ated with slow walking daily for 10 minutes is just 
one month. A statistical test is more likely to find a 
significant increase in mean life for those who slow 
walk daily if 

(a) it is based on a very large random sample. 

(b) it is based on a very small random sample. 

(c) The size of the sample has little effect on 
significance for such a small increase in life 
expectancy. 


A medical experiment compared zinc supple- 
ments with a placebo for reducing the duration of 
colds. Let ws denote the mean decrease, in days, 
in the duration of a cold. A decrease to w = lisa 
practically important decrease. The significance 
level of a test of Ho: w = 0 versus H,: w > O is 
defined as 


(a) the probability that the test rejects Hp when 
me = Ois true. 

(b) the probability that the test rejects Hy when 
pe = J is true. 

(c) the probability that the test fails to reject Ho 
when pw = | is true. 


(Optional topic) The power of the test in 
Exercise 18.25 against the specific alternative w = 1 
is defined as 


(a) the probability that the test rejects Hp when 
pe = Ois true. 

(b) the probability that the test rejects Hy when 
pe = LJ is true. 

(c) the probability that the test fails to reject Ho 
when pw = | is true. 


(Optional topic) The power of a test is important in 
practice because power 


(a) describes how well the test performs when the 
null hypothesis is actually true. 

(b) describes how sensitive the test is to violations 
of conditions such as Normal population 
distribution. 

(c) describes how well the test performs when the 
null hypothesis is actually not true. 


Exercises 
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18.28 


18.29 


18.30 


18.31 


18.32 


18.33 


Hotel managers. In Exercise 17.32 (page 412), you 
carried out a test of significance based on data from 
148 general managers of three-star and four-star 
hotels. Before you trust your results, you would like 
more information about the data. What facts would 
you most like to know? 


Color blindness in Africa. An anthropologist 
claims that color blindness is less common in so- 
cieties that live by hunting and gathering than in 
settled agricultural societies. He tests a number of 
adults in two populations in Africa, one of each type. 
The proportion of color-blind people is significantly 
lower (P < 0.05) in the hunter-gatherer popula- 
tion. What additional information would you want 
to help you decide whether you believe the anthro- 
pologist’s claim? 


Sampling at the mall. A market researcher chooses 
at random from women entering a large outlet mall. 
One outcome of the study is a 95% confidence inter- 
val for the mean of “the highest price you would pay 
for a pair of shoes.” 


(a) Explain why this confidence interval does not 
give useful information about the population of all 
women. 

(b) Explain why it may give useful information 
about the population of women who shop at large 
outlet malls. 

Sensitive questions. The 2013 Youth Risk Behav- 
ior Survey found that 326 individuals in its random 
sample of 1216 Ohio high school students said that 
they had had multiple sexual partners during their 
life. That’s 27% of the sample. Why is this estimate 
likely to be biased? Do you think it is biased high or 
low? Does the margin of error of a 95% confidence 
interval for the proportion of all Ohio high school 
students with multiple partners during their life al- 
low for this bias? 


College degrees. At the Statistics Canada website, 
www.statcan.gc.ca, you can find the percent of 
adults in each province or territory who have at least a 
university certificate, diploma, or degree at bachelor’s 
level or above. It makes no sense to find x for these 
data and use it to get a confidence interval for the mean 
percent p in all 13 provinces or territories. Why not? 


An outlier strikes. You have data on an SRS of 
recent graduates from your college that shows how 
long each student took to complete a bachelor’s de- 
gree. The data contain one high outlier. Will this 
outlier have a greater effect on a confidence interval 
for mean completion time if your sample is small or if 
it is large? Why? 
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Can we trust this interval? Here are data on the 
percent change in the total mass (in tons) of wildlife 
in several West African game preserves in the years 


1971 to 1999:8 
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Software gives the 95% confidence interval for the 
mean annual percent change as —6.66% to —2.55%. 
There are several reasons why we might not trust 
this interval. 


(a) Examine the distribution of the data. What 
feature of the distribution throws doubt on the 
validity of statistical inference? 

(b) Plot the percents against year. What trend do 
you see in this time series? Explain why a trend 
over time casts doubt on the condition that years 
1971 to 1999 can be treated as an SRS from a larger 
population of years. 


When to use pacemakers. A medical panel pre- 
pared guidelines for when cardiac pacemakers 
should be implanted in patients with heart prob- 
lems. The panel reviewed a large number of medi- 
cal studies to judge the strength of the evidence 
supporting each recommendation. For each recom- 
mendation, they ranked the evidence as level A 
(strongest), B, or C (weakest). Here, in scrambled 
order, are the panel’s descriptions of the three lev- 
els of evidence.’ Which is A, which B, and which C? 
Explain your ranking. 

Evidence was ranked as level — when data were 
derived from a limited number of trials involving 
comparatively small numbers of patients or from 
well-designed data analysis of nonrandomized studies 

or observational data registries. 

Evidence was ranked as level — if the data were 
derived from multiple randomized clinical trials in- 
volving a large number of individuals. 

Evidence was ranked as level — when consensus of ex- 
pert opinion was the primary source of recommendation. 
What is significance good for? Which of the 
following questions does a test of significance 
answer? Briefly explain your replies. 


(a) Is the sample or experiment properly designed? 
(b) Is the observed effect due to chance? 
(c) Is the observed effect important? 
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Why are larger samples better? Statisticians prefer 
large samples. Describe briefly the effect of increas- 
ing the size of a sample (or the number of subjects in 
an experiment) on each of the following: 


(a) The P-value of a test, when H, is false and all 
facts about the population remain unchanged as n 
increases. 

(b) (Optional topic) The power of a fixed level a 
test, when a, the alternative hypothesis, and all 
facts about the population remain unchanged. 


Divorce rates. Divorce rates vary from city to city 
in the United States. We have lots of data on many 
USS. cities. Statistical software makes it easy to per- 
form dozens of significance tests on dozens of vari- 
ables to see which ones best predict divorce rate. 
One interesting finding is that those cities with ma- 
jor league ballparks tend to have significantly lower 
divorce rates than other cities. To improve your 
chances of a successful marriage, should you use 
this “significant” variable to decide where to live? 
Explain your answer. 


A test goes wrong. Software can generate samples 
from (almost) exactly Normal distributions. Here 
is a random sample of size 5 from the Normal dis- 
tribution with mean 10 and standard deviation 2: 
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These data match the conditions for a x test better 
than real data will: the population is very close to 
Normal and has known standard deviation o = 2, 
and the population mean is w = 10. Although we 
know the true value of 4, suppose we pretend that 
we do not and we test the hypotheses 


Ho: w = 8 
Hy pA8 


(a) What are the z statistic and its P-value? Is the 
test significant at the 5% level? 

(b) We know that the null hypothesis does not 
hold, but the test failed to give strong evidence 
against Hp. Explain why this is not surprising. 
Reducing the gender gap. In many science dis- 
ciplines, women are outperformed by men on test 
scores. Will “values affirmation training” improve 
self-confidence and hence performance of women 
relative to men in science courses? A study conduct- 
ed at a large university compares the scores of men 
and women at the end of a large introductory phys- 
ics course on a nationally normed standardized test 
of conceptual physics, the Force and Motion Con- 
ceptual Evaluation (FMCE). Half the women in the 
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course were given values affirmation training during 
the course; the other half received no training. The 
study reports that there was a significant difference 
(P < 0.01) in the gap between men’s and women’s 
scores, although the gap for women who received 
the values affirmation training was much smaller 
than that for women who did not receive training. 
The study also reports that a 95% confidence inter- 
val for the mean difference in scores on the FMCE 
exam between women who received the training 
and those who didn’t is 13 + 8 points. You are a 
faculty member in the physics department, and the 
provost, who is interested in women in science, asks 
you about the study. 

(a) Explain in simple language what “a significant 
difference (P < 0.01)” means. 

(b) Explain clearly and briefly what “95% confi- 
dence” means. 

(c) Is this study good evidence that requiring values 
affirmation training of all female students would 
greatly reduce the gender gap in scores on science 
tests in college courses? 


How far do rich parents take us? How much edu- 
cation children get is strongly associated with the 
wealth and social status of their parents. In social sci- 
ence jargon, this is “socioeconomic status,” or SES. 
But the SES of parents has little influence on wheth- 
er children who have graduated from college go on 
to yet more education. One study looked at whether 
college graduates took the graduate admissions tests 
for business, law, and other graduate programs. The 
effects of the parents’ SES on taking the LSAT test 
for law school were “both statistically insignificant 
and small.” 


(a) What does “statistically insignificant” mean? 
(b) Why is it important that the effects were small 
in size as well as insignificant? 


This wine stinks. How sensitive are the untrained 
noses of students? Exercise 16.27 (page 389) gives 
the lowest levels of dimethyl sulfide (DMS) that 
10 students could detect. You want to estimate 
the mean DMS odor threshold among all students 
and you would be satisfied to estimate the mean to 
within £0.1 with 99% confidence. The standard 
deviation of the odor threshold for untrained noses 
is known to be ao = 7 micrograms per liter of wine. 
How large an SRS of untrained students do you 
need? 


Pulling wood apart. You want to estimate the 
mean load needed to pull apart the pieces of wood 
in Exercise 16.25 (page 388) to within +600 
pounds with 95% confidence. How large a sample 
is needed? 


The following exercises concern the optional material on the power 


of a test. 


18.44 The first child has higher IQ. Does the birth 


18.45 


18.46 


order of a family’s children influence their IQ 
scores? A careful study of 241,310 Norwegian 
18- and 19-year-olds found that firstborn children 
scored 2.3 points higher on the average than sec- 
ond children in the same family. This difference 
was highly significant (P < 0.001). A commenta- 
tor said, “One puzzle highlighted by these latest 
findings is why certain other within-family stud- 
ies have failed to show equally consistent results. 
Some of these previous null findings, which have 
all been obtained in much smaller samples, may 
be explained by inadequate statistical power.”!® 
Explain in simple language why tests having low 
power often fail to give evidence against a null hy- 
pothesis even when the hypothesis is really false. 


How valium works. Valium is a common antide- 
pressant and sedative. A study investigated how 
valium works by comparing its effect on sleep in 
seven genetically modified mice and eight normal 
control mice. There was no significant difference be- 
tween the two groups. The authors say that this lack 
of significance “is related to the large inter-individ- 
ual variability that is also reflected in the low power 
(20%) of the test.”!! 


(a) Explain exactly what power 20% against a 
specific alternative means. 

(b) Explain in simple language why tests having 
low power often fail to give evidence against a 
null hypothesis even when the null hypothesis 
is really false. 

(c) What fact about this experiment most likely 
explains the low power? 


Dialysis. An article in the New England Journal of 
Medicine describes a randomized controlled trial that 
compared early versus late initiation of dialysis on 
the survival of adults with progressive chronic kid- 
ney disease. The experiment found no significant 
difference between early and late initiation of dialy- 
sis. According to the article, the study was designed 
to have power 80%, with a two-sided Type I error 
of 0.05, to detect a clinically important difference of 
approximately 10 percentage points in the absolute 


tisk of death.” 


(a) What fixed significance level was used in calcu- 
lating the power? 

(b) Explain to someone who knows no statistics 
why power 80% means that the experiment 

would probably have been significant if there was 

a difference between early and late initiation of 
dialysis. 
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Power. In Exercise 18.39, a sample from a Normal 
population with mean p = 10 and standard de- 
viation o = 2 failed to reject the null hypothesis 
Ho: w = 8 at the a = 0.05 significance level. Enter 
the information from this example into the Statistical 
Power applet. (Don’t forget that the alternative hy- 
pothesis is two-sided.) What is the power of the test 
against the alternative 4 = 10? Because the pow- 
er is not high, it isn’t surprising that the sample in 
Exercise 18.39 failed to reject Hp. 


Finding power by hand. Even though software is 
used in practice to calculate power, doing the work 
by hand builds your understanding. Return to the 
test in Example 18.6. There are n = 10 observations 
from a population with standard deviation o = 1 
and unknown mean pw. We will test 

Hy: w = 0 

Ea ee 0 
with fixed significance level a = 0.05. Find the 
power against the alternative uw = 0.8 by following 
these steps. 


(a) The z test statistic is 


XSi m= 0 


Ww Vivi © ak 


(Remember that you won’t know the numerical 
value of x until you have data.) What values of x 
lead to rejecting Hy at the 5% significance level? 

(b) Starting from your result in part (a), what values 
of x lead to rejecting H,? The area above these 
values is shaded under the top curve in Figure 18.1. 
(c) The power is the probability that you observe 
any of these values of x when ww = 0.8. This is the 
shaded area under the bottom curve in Figure 18.1. 
What is this probability? 


Finding power by hand: two-sided test. Exercise 18.48 
shows how to calculate the power of a one-sided z 
test. Power calculations for two-sided tests follow the 
same outline. We will find the power of a test based 
on a random sample of size 5, discussed in Exercise 
18.39. The hypotheses are 

Ho: wp = 8 

H,: wp #8 
The population of all measurements is Normal with 
standard deviation o = 2, and the alternative we 
hope to be able to detect is w = 10 (the actual 
population mean). (If you used the Statistical Power 
applet for Exercise 18.47, the two Normal curves for 
n = 5 illustrate parts (a) and (b) below.) 


z 


(a) Write the z test statistic in terms of the sample 
mean x. For what values of x does this two-sided 
test reject Hp at the 5% significance level? 
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Inference in Practice 


(b) Restate your result from part (a): what values of 
x lead to rejection of Hy? 

(c) Now suppose that = 10. What is the probability 
of observing an x that leads to rejection of H,? This is 
the power of the test. 


Error probabilities. You read that a statistical test 
at significance level a = 0.01 has power 0.64. What 
are the probabilities of Type I and Type II errors for 
this test? 


Power. You read that a statistical test at the a = 0.05 
level has probability 0.36 of making a Type II error 
when a specific alternative is true. What is the power 


18.53 


Two types of error. Go to the Statistical Significance 
applet. This applet carries out tests at a fixed signifi- 
cance level. When you arrive, the applet is set for 
the cola-tasting test of Example 18.6. That is, the 
hypotheses are 


We have an SRS of size 10 from a Normal popula- 
tion with standard deviation o = 1, and we willdoa 
test at level a = 0.05. At the bottom of the screen, 
a button allows you to choose a value of the mean ps 


of the test against this alternative? 


and then to generate samples from a population with 
that mean. 


Find the error probabilities. You have an SRS of 


size n = 25 from a Normal distribution with o = 1. 


You wish to test 


Hy: w = 0 
Isle ja = © 


You decide to reject Hy if x > O and not reject Hy 


otherwise. 


(a) Find the probability of a Type I error. That is, 
find the probability that the test rejects Hy when in 


fact uw = 0. 


(b) Find the probability of a Type II error when 
pw = 0.25. This is the probability that the test fails 


to reject Hy when in fact w = 0.25. 


(c) Find the probability of a Type II error when 


Ore 


18.54 


18.55 


18.56 


(a) Set wp = 0, so that the null hypothesis is true. 
Each time you click the button, a new sample ap- 
pears. If the sample x lands in the colored region, 
that sample rejects Hp at the 5% level. Click 100 
times rapidly, keeping track of how many samples 
reject Hy. Use your results to estimate the prob- 
ability of a Type I error. If you kept clicking forever, 
what probability would you get? 

(b) Now set w = 0.8. Example 18.6 shows that 
the test has power 0.812 against this alternative. 
Click 100 times rapidly, keeping track of how 
many samples fail to reject Ho. Use your results 
to estimate the probability of a Type II error. If 
you kept clicking forever, what probability would 
you get? 


Exploring the Web 


Practically important? Find an example of a study in which a statistically signifi- 
cant result may not be practically important. Summarize the study and its conclu- 
sions in your own words. The CHANCE website at www. causeweb.org/wiki/ 
chance//index.php/Main_ Page isa good place to look for examples. 


Wise use of inference. The report of the American Psychological Associa- 
tion’s Task Force on Statistical Inference is an excellent brief introduction to 
wise use of inference. The report appeared in the journal American Psycholo- 
gist in 1999. You can find a copy of the initial report on the web in the list of 
“TESI Publications/Links” from this journal at www.apa.org/science/ 
leadership/bsa/statistical/index.aspx. Read the initial report. 
Are the authors of the report opposed to the use of hypothesis testing? 
Describe one abuse of hypothesis testing that is cited in this report. 


Spotting bad science. Professor Pat Humphrey at Georgia Southern Uni- 
versity mentioned to us that the website compoundchem.com posts “A Rough 
Guide to Spotting Bad Science.” You can find this at www. compoundchem. 
com/2014/04/02/a-rough-guide-to-spotting-bad-science/. 
The guide gives 12 warnings. Which of these correspond to warnings given in 
this chapter? (Give page numbers.) 
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From Data Production Ine 


a Part Ill Summary 


to Inference: Part Ill stent 
Review 


n Part I of this book, you mastered data analysis, the use of graphs and numerical 

summaries to organize and explore any set of data. Part II introduced designs for 

data production. Part II introduced probability, and the reasoning of statistical 
inference. Parts IV and V will deal in detail with practical inference. 

Statistical inference draws conclusions about a population on the basis of 
sample data and uses probability to indicate how reliable the conclusions are. A 
confidence interval estimates an unknown parameter. A significance test shows 
how strong the evidence is for some claim about a parameter. 

The probabilities in both confidence intervals and tests tell us what would hap- 
pen if we used the method for the interval or test very many times. 


A confidence level is the success rate of the method for a confidence interval. 
This is the probability that the method actually produces an interval that 
captures the unknown parameter. A 95% confidence interval gives a correct 
result (captures the unknown parameter) 95% of the time when we use it 


repeatedly. 
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The Idea of a Confidence Interval 


Critical values + 2* mark off 


central area C of the standard 
Normal curve. 
ee Jn 


SRS sizen x+27* _o Proportion C 


Ti Jn of these 
: intervals Area C 
SRSsizen yar 
+ **4 in captures 
e e true u 
e e 
Population : ‘ — 0 7 
p= 
o known 
FIGURE 19.1 


« A P-value tells us how surprising the observed outcome would be if the null 
hypothesis were true. That is, P is the probability that the test would produce 
a result at least as extreme as the observed result if the null hypothesis really 


were true. Very surprising outcomes (small P-values) are good evidence that 
the null hypothesis is not true. 


Figures 19.1 and 19.2 use the x procedures introduced in Chapters 16 and 17 to 
present in picture form the big ideas of confidence intervals and significance tests. 
These ideas are the foundation for the rest of this book. We will have much to say 
about many statistical methods and their use in practice. In every case, the basic 
reasoning of confidence intervals and significance tests remains the same. 


The Idea of a Significance Test 


Values z have the 
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FIGURE 19.2 


Here are the most important skills you should have acquired from reading 
Chapters 12 to 18. 


A. Probability 


ik, 


Recognize that the occurrence of some phenomena are random. Probability 
describes the long-run regularity of random phenomena. 

Understand that the probability of an event is the proportion of times the 
event occurs in very many repetitions of a random phenomenon. Use the 
idea of probability as long-run proportion to think about probability. 

Use basic probability rules to detect illegitimate assignments of probability: 
any probability must be a number between 0 and 1, and the total probabil- 
ity assigned to all possible outcomes must be 1. 

Use basic probability rules to find the probabilities of events that are formed 
from other events. The probability that an event does not occur is | minus 
the probability it does occur. If two events are disjoint, the probability that 
one or the other occurs is the sum of their individual probabilities. 

Find probabilities in a discrete probability model by adding the probabilities 
of their outcomes. Find probabilities in a continuous probability model as 
areas under a density curve. 

Use the notation of random variables to make compact statements about ran- 
dom outcomes, such as P(x = 4) = 0.3. Be able to interpret such statements. 


B. Sampling Distributions 


il. 
Up 


Identify parameters and statistics in a statistical study. 

Recognize the fact of sampling variability: a statistic will take different 
values when you repeat a sample or experiment. 

Interpret a sampling distribution as describing the values taken by a sta- 
tistic in all possible repetitions of a sample or experiment under the same 
conditions. 

Interpret the sampling distribution of a statistic as describing the probabili- 
ties of its possible values. 

Understand how the sampling distribution of a sample statistic can be used 
to assess statistical significance. 


C. General Rules of Probability (Optional topic) 


IF 
Ds 


Use Venn diagrams to picture relationships among several events. 

Use the general addition rule to find probabilities that involve overlapping 
events. 

Understand the idea of independence. Judge when it is reasonable to as- 
sume independence as part of a probability model. 

Use the multiplication rule for independent events to find the probability 
that all of several independent events occur. 

Use the multiplication rule for independent events in combination with 
other probability rules to find the probabilities of complex events. 


Understand the idea of conditional probability. Find conditional prob- 
abilities for individuals chosen at random from a table of counts of possible 
outcomes. 


Part Ill Summary 
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(e 


8. 


Use the general multiplication rule to find P(A and B) from P(A) and the 
conditional probability P(B | A). 
Use tree diagrams to organize several-stage probability models. 


Binomial Distributions (Optional topic) 


Me 


Recognize the binomial setting: a fixed number n of independent success— 
failure trials with the same probability p of success on each trial. 
Recognize and use the binomial distribution of the count of successes in a 
binomial setting. 

Use the binomial probability formula to find probabilities of events involv- 
ing the count X of successes in a binomial setting for small values of n. 
Find the mean and standard deviation of a binomial count X. 

Recognize when you can use the Normal approximation to a binomial 
distribution. Use the Normal approximation to calculate probabilities that 
concern a binomial count X. 


E. The Sampling Distribution of a Sample Mean 


il, 


Recognize when a problem involves the mean x of a sample. Understand that 
x estimates the mean p of the population from which the sample is drawn. 
Use the law of large numbers to describe the behavior of x as the size of the 
sample increases. 

Find the mean and standard deviation of a sample mean x from an SRS of 
size n when the mean pw and standard deviation a of the population are 
known. 

Understand that x is an unbiased estimator of wx and that the variability of 
x about its mean p gets smaller as the sample size increases. 

Understand that x has approximately a Normal distribution when the 
sample is large (central limit theorem). Use this Normal distribution to 
calculate probabilities that concern x. 


Confidence Intervals 


de 


State in nontechnical language what is meant by “95% confidence” or 
other statements of confidence in statistical reports. 

Apply the four-step process (page 381) for any confidence interval. This 
process will be used more extensively in later chapters. Remember to 
describe your results in the context of the problem. 

Calculate a confidence interval for the mean pe of a Normal population 
with known standard deviation o, using the formula x + z*o/V*n. 
Understand how the margin of error of a confidence interval changes with 
the sample size and the level of confidence C. 

Find the sample size required to obtain a confidence interval of specified mar- 
gin of error m when the confidence level and other information are given. 
Identify sources of error in a study that are not included in the margin of 
error of a confidence interval, such as undercoverage or nonresponse. 


Significance Tests 


if 


State the null and alternative hypotheses in a testing situation when the 
parameter in question is a population mean p. 

Explain in nontechnical language the meaning of the P-value when you are 
given the numerical value of P for a test. 


3. Apply the four-step process (page 401) for any significance test. This pro- 
cess will be used more extensively in later chapters. Remember to describe 
your results in the context of the problem. 


4. Calculate the one-sample x test statistic and the P-value for both one-sided 
and two-sided tests about the mean w of a Normal population. 


5. Assess statistical significance at standard levels a, either by comparing P 
with @ or by comparing z with standard Normal critical values. 


6. Recognize that significance testing does not measure the size or importance 
of an effect. Explain why a small effect can be significant in a large sample 
and why a large effect can fail to be significant in a small sample. 


7. Recognize that any inference procedure acts as if the data were properly 
produced. The z confidence interval and test require that the data be an 
SRS from the population. 


The questions below include multiple choice, calculations, and short-answer questions. They will 
help you review the basic ideas and skills presented in Chapters 12 to 18. 


19.1 A randomly chosen subject arrives for a study of exercise and fitness. Describe a 
sample space for each of the following. (In some cases, you may have some free- 
dom in your choice of S.) 


(a) The subject is either female or male. 

(b) After 10 minutes on an exercise bicycle, you ask the subject to rate his or 
her effort on the Rate of Perceived Exertion (RPE) scale. RPE ranges in whole- 
number steps from 6 (no exertion at all) to 20 (maximal exertion). 

(c) You measure VO2 max, the maximum volume of oxygen consumed per 
minute during exercise. VO2 is generally between 2.5 and 6.1 liters per minute. 
(d) You measure the maximum heart rate (beats per minute). 


Internet search engines. Internet search sites compete for users because they sell advertising 
space on their sites and can charge more if they are heavily used. Choose an Internet search attempt 
at random. Here is the probability distribution for the site the search uses:! 


Site Google Microsoft Yahoo Ask Network Others 
Probability 0.68 0.18 0.10 0.02 ? 


Use this information to answer Questions 19.2 to 19.4. 
19.2 What is the probability that a search attempt is made at Microsoft or Yahoo? 
(a) 0.28 
(b) 0.30 
(c) 0.96 
(d) Cannot be determined from the information given 


19.3. What is the probability that a search attempt is made at a site other than the lead- 
ing four? 


(b 
(c 
( 


d) Cannot be determined from the information given 
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19.4 What is the probability that a search attempt is directed to a site other than 
Google? 
(a) 0.02 
(b) 0.32 
(c) 0.98 


(d) Cannot be determined from the information given 


How many in the house? In government data, a household consists of all occupants of a 
dwelling unit. Here is the distribution of household size in the United States: 


Number of Persons 1 7 5 4 5 6 tf 
Probability 0.27 034 016 0.14 0.06 0.02 0.01 


Choose an American household at random and let the random variable Y be the number of persons 
living in the household. Use this information to answer Questions 19.5 to 19.7. 


19.5 Express “more than one person lives in this household” in terms of Y. What is the 
probability of this event? 


19.6 WhatisP(2 < Y = 4)? 


19.7. Whatis P(Y 4 2)? 


(a) 0.27 
(b) 0.34 
(c) 0.51 
(d) 0.66 


How many children? Choose at random an American woman between the ages of 15 and 44. 
Here is the distribution of the number of children the woman has given birth to:? 


X = Number of Children 0 1 2 3 4 5) 
Probability 0.471 0.169 0.204 0.104 0.034 0.018 


(The few women with six or more children are included in the “five children” group.) Use this 
information to answer Questions 19.8 to 19.11. 


19.8 Check that this distribution satisfies the two requirements for a legitimate dis- 
crete probability model. 


19.9 Describe in words the event P(X = 2). What is the probability of this event? 


19.10 What is P(X < 2)? 


(a) 0.169 
(b) 0.471 
(c) 0.640 
(d) 0.844 


19.11 Write the event “a woman gives birth to three or more children” in terms of values 
of X. What is the probability of this event? 


Random number generators. Many random number generators allow users to specify the 
range of the random numbers to be produced. Suppose you specify that the random number Y can 
take any value between 0 and 5. The density curve of the outcome has a constant height between 0 
and 5, and height 0 elsewhere. Use this information to answer Questions 19.12 to 19.15. 


19.12 The random variable Y is 


(a) discrete. 

(b) continuous, but not Normal. 
(c) continuous and Normal. 

(d) none of the above. 


19.13 The height of the density curve between 0 and 5 is 


(a) 0.2. 
(b) 1. 
(Qs: 


(d) none of the above. 
19.14 Drawa graph of the density curve and find P(1 = Y S 3). 


19.15 Find P(4 < Y < 7). 


An IQ test. The Wechsler Adult Intelligence Scale (WAIS) is a common “IQ test” for adults. 
The distribution of WAIS scores for persons over 16 years of age is approximately Normal with 
mean 100 and standard deviation 15. Use this information to answer Questions 19.16 to 19.19. 


19.16 What is the probability that a randomly chosen individual has a WAIS score of 
105 or higher? 


(a) 0.0005 
(b) 0.3707 
(c) 0.4400 
(d) 0.6293 


19.17 What are the mean and standard deviation of the average WAIS score x for an 
SRS of 60 people? 


(a) Mean = 13.56, standard deviation = 15 
(b) Mean = 100, standard deviation = 15 

(c) Mean = 100, standard deviation = 1.94 
(d) Mean = 100, standard deviation = 0.25 


19.18 What is the probability that the average WAIS score of an SRS of 60 people is 105 
or higher? 


(a) 0.0049 
(b) 0.3707 
(c) 0.9738 
(d) None of the above 


19.19 Would your answers to any of Questions 19.16 to 19.18 be affected if the distribu- 
tion of WAIS scores in the adult population were distinctly non-Normal? Explain. 


Reaction times. The time that people require to react to a stimulus usually has a right-skewed 
distribution, as lack of attention or tiredness causes some lengthy reaction times. Reaction times for 
children with attention deficit hyperactivity disorder (ADHD) are more skewed, as their condition 
causes more frequent lack of attention. In one study, children with ADHD were asked to press 
the spacebar on a computer keyboard when any letter other than X appeared on the screen. With 
2 seconds between letters, the mean reaction time was 445 milliseconds (ms) and the standard 
deviation was 82 ms. Take these values to be the population 4 and o for ADHD children. Use 
this information to answer Questions 19.20 to 19.22. 


19.20 What are the mean and standard deviation of the mean reaction time x for a 
randomly chosen group of 15 ADHD children? For a group of 150 such children? 


19.21. The distribution of reaction time is strongly skewed. Explain briefly why we hesi- 
tate to regard x as Normally distributed for 15 children but are willing to use a 
Normal distribution for the mean reaction time of 150 children. 


19.22 What is the approximate probability that the mean reaction time in a group of 
150 ADHD children is greater than 450 ms? 


Pesticides in whale blubber: estimation. The level of pesticides found in the blubber of 
whales is a measure of pollution of the oceans by runoff from land and can also be used to identify dif- 
ferent populations of whales. A sample of eight male minke whales in the West Greenland area of the 
North Atlantic found the mean concentration of the insecticide dieldrin to be x = 357 nanograms 
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19.23 


© redbrickstock.com/Alamy 


19.24 


19.25 


19.26 


per gram of blubber (ng/g).* Suppose that the concentration in all such whales varies Normally with 
standard deviation a = 50 ng/g. Use this information to answer Questions 19.23 to 19.26. 


A 95% confidence interval to estimate the mean level is 


(a) 344.75 to 369.25. 
(b) 339.32 to 374.68. 
(c) 322.35 to 391.65. 
(d) 259.00 to 455.00. 


A 90% confidence interval to estimate the mean level is 


(a) 346.72 to 367.28. 
(b) 327.92 to 386.08. 
(c) 311.36 to 402.54. 


(d) 274.75 to 439.25. 


Find an 80% confidence interval for the mean concentration of dieldrin in the 
minke whale population. 


What general fact about confidence intervals do the margins of error of your three 
intervals in the previous problems illustrate? 


Estimating blood cholesterol. The distribution of blood cholesterol level in the population of 
young men aged 20 to 34 years is close to Normal with standard deviation o = 41 milligrams per 
deciliter (mg/dL). You measure the blood cholesterol of 14 cross-country runners. The mean level 
is x = 172 mg/dL. Assume that @ is the same as in the general population. Use this information 
to answer Questions 19.27 to 19.29. 


WELee7/ 


19.28 


HOW MANY MILES 
ik PER GALLON? 


As gasoline prices rise, more people 
pay attention to the government's 
gas mileage ratings of their vehicles. 
Until recently, these ratings 
overstated the miles per gallon we 
can expect in real-world driving. 
The ratings assumed a top speed of 
60 miles per hour, slow acceleration, 
and no air-conditioning. That 
doesn’t resemble what we see 
around us on the highway. Maybe 

it doesn’t resemble the way we 
ourselves drive. Starting with 

2008 models, the ratings assume 
higher speeds (80 miles per hour 
tops), faster acceleration, and 
air-conditioning in warm weather. 
Mileage ratings of the same vehicle 
dropped by about 12% in the city 19.31 
and 8% on the highway. 


19.29 


19.30 


A 90% confidence interval for the mean level w among cross-country runners is 


) 172 + 4.82 mg/dL. 
(b) 172 + 18.03 mg/dL. 
(c) 172 + 21.48 mg/dL. 
(d) none of the above. 


How large a sample is needed to cut the margin of error in the previous exercise 


in half? 


How large a sample is needed to cut the margin of error to +5 mg/dL? 
(a) 14 

(b) 68 

(c) 182 

(d) 259 


The Environmental Protection Agency (EPA) fuel economy ratings say that 
the 2014 Toyota Prius hybrid car gets 46 miles per gallon (mpg) on the highway. 
Deborah wonders whether the actual long-term average highway mileage p of 
her new Prius is less than 46 mpg. She keeps careful records of gas mileage for 
3000 miles of highway driving. Her result is x = 45.2 mpg. What are her null and 
alternative hypotheses? 

(a) Ho: w = 46, Hy: w < 46 

(b) Ho: w = 46, Hy: uw > 46 

(c) Ho: x = 46, H,: x < 46 

(d) Ho: x = 46, H,: x > 46 


According to the National Survey of Student Engagement (NSSE) the average 
amount of time that first-year college students spent preparing for class (studying, 


Testing 


reading, writing, doing homework or lab work, analyzing data, rehearsing, and 
other academic activities) in 2013 was 14.25 hours per week. Your college 
wonders if the average pw for its first-year students in 2013 differed from the 
national average. A random sample of 500 students who were first-year stu- 
dents in 2013 claims to have spent an average of x = 13.4 hours per week on 
homework in their first year. What are the null and alternative hypotheses 
for a comparison of first-year students at your college with national first-year 
students in 2013? 

(a) Ho: x = 14.25, Hy: x 7 14.25 

(b) Ho: x = 13.4,H,: x > 13.4 

(c) Ho: w = 14.25, Hy: wp ¥ 14.25 

(d) Ho: w = 13.4, Hy: pw > 13.4 


blood cholesterol. The distribution of blood cholesterol level in the population of 


young men aged 20 to 34 years is close to Normal with mean 188 milligrams per deciliter (mg/dL) 
and standard deviation 41 mg/dL. You measure the blood cholesterol of 14 young men aged 20 to 
34 years who have completed at least one marathon. The mean level is x = 172 mg/dL. Assume 
that o is the same as in the general population. Use this information to answer Questions 19.32 


to 19.34. 
19.32 


19.33 


19.34 


19.35 


19.36 


We suspect that the mean yp for all young men aged 20 to 34 years who have 
completed at least one marathon is lower than that of the population of 
all young men aged 20 to 34 years. Thus, we decide to test the hypotheses 
Ho: w = 188, H,: w < 188. The z test statistic for testing these hypotheses is 


(a) 5.46 

(b) —5.46 

(c) 1.46 

(d) —1.46 

The result is significant at 

(a) a = 0.01. 

(b) a = 0.05 but not at a = 0.01. 
(c) a = 0.10 but not at a = 0.05. 
(d) a = 0.25 but not at a = 0.10. 


You increase the sample of young men aged 20 to 34 years who have completed 
at least one marathon from 14 to 56. Suppose that this larger sample gives the 
same mean level, x = 172 mg/dL. Redo the test in exercises 19.32 and 19.33. The 
result is significant at 

(a) a = 0.01. 

(b) a = 0.05 but not at a = 0.01. 

(c) a = 0.10 but not at a = 0.05. 

(d) a = 0.25 but not at a = 0.10. 


The Food and Drug Administration regulates the amount of dieldrin in raw 
food. For some foods, no more than 100 ng/g is allowed. Using the information 
for Exercises 19.23 to 19.26, is there good evidence that the mean concentra- 
tion mw in whale blubber is above 100 ng/g? Carry out a test of the hypotheses 
Ho: w = 100, H,: w > 100 assuming that the “simple conditions” (page 374) 
hold. The P-value of your test is 

(a) above 0.10. 

(b) less than or equal to 0.10 but greater than 0.05. 

(c) less than or equal to 0.05 but greater than 0.01. 

(d) no more than 0.01. 

Infants weighing less than 1500 grams at birth are classed as “very low birth 
weight.” Low birth weight carries many risks. One study followed 113 male infants 
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19.37 


19.38 


19.39 


19.40 


19.41 


with very low birth weight to adulthood. At age 20, the mean IQ score for these 
men was x = 87.6.° IQ scores vary Normally with standard deviation o = 15. 
Give a 95% confidence interval for the mean IQ score at age 20 for all very-low- 
birth-weight males. 


IQ tests are scaled so that the mean score in a large population should be pp = 100. 
We suspect that the very-low-birth-weight population has mean score less than 
100. Does the study described in the previous exercise give good evidence that 
this is true? State hypotheses, carry out a test assuming that the “simple condi- 
tions” (page 374) hold, compute the P-value, and give your conclusion in plain 
language. 


When our brains store information, complicated chemical changes take place. In 
trying to understand these changes, researchers blocked some processes in brain 
cells taken from rats and compared these cells with a control group of normal 
cells. They say that “no differences were seen” between the two groups in four 
response variables. They give P-values 0.45, 0.83, 0.26, and 0.84 for these four 
comparisons.° Which of the following statements is correct? 


(a) It is literally true that “no differences were seen.” That is, the mean responses 
were exactly alike in the two groups. 

(b) The mean responses were exactly alike in the two groups for at least one 

of the four response variables measured, but not for all of them. 

(c) The statement “no differences were seen” means that the observed 
differences were not statistically significant at the significance level used by 

the researchers. 

(d) The statement “no differences were seen” means that the observed differ- 
ences were all less than 1 (and were actually 0.45, 0.83, 0.26, and 0.84 for 

these four comparisons). 


In a 2013 study, researchers compared various measurements on overweight 
first-born and second-born middle-aged men.’ They found that first-borns had a 
significantly higher weight (P = 0.013) than second-borns, but no significant dif- 
ference in total cholesterol (P = 0.74). Explain carefully why P = 0.013 means 
there is evidence that first-born middle-aged men may have higher weights than 
second-borns and why P = 0.74 provides no evidence that first-born middle- 
aged men may have different total cholesterol levels than second-borns. 


We often see televised reports of brushfires threatening homes in California. Some 
people argue that the modern practice of quickly putting out small fires allows 
fuel to accumulate and so increases the damage done by large fires. A detailed 
study of historical data suggests that this is wrong—the damage has risen simply 
because there are more houses in risky areas. As usual, the study report gives 
statistical information tersely. Here is the summary of a regression of number of 
fires on decade (nine data points, for the 1910s to the 1990s): “Collectively, since 
1910, there has been a highly significant increase (r? = 0.61, P < 0.01) in the 
number of fires per decade.” How would you explain this statement to someone 
who knows no statistics? Include an explanation of both the description given by 
r’ and its statistical significance. 


(Optional topic) Byron claims that the probability Georgia and Alabama will 
play in the SEC championship football game this year is 68%. The number 68% is 


(a) the proportion of times Georgia and Alabama have played in the champion- 
ship game in the past. 

(b) Byron’s personal probability that Georgia and Alabama will play in the SEC 
championship football game this year. 

(c) the area under a Normal density curve. 


(d) all of the above. 


19.42 


(Optional topic) Causes of death. Accidents, suicide, and murder are the leading 
causes of death for young adults. Here are the counts of violent deaths in a recent year 
among people 20 to 24 years of age: 


FEMALE MALE 
Accidents 1818 6457 
Homicide 457 2870 
Suicide 345 Sy) 


(a) Choose a violent death in this age group at random. What is the probability 
that the victim was male? 

(b) Find the conditional probability that the victim was male, given that the 
death was accidental. 

(c) Use your answers from parts (a) and (b) to explain whether sex and type of 
death are independent or not. 


(Optional topic) Distance learning. A study of the students taking distance learning 
courses at a university finds that they are mostly older students not living in the university town. 
If a distance learning student is chosen at random, the P(student is over 25 years old) = 0.7, 
the P(student is local) = 0.25 and the P(student is over 25 years old and local) = 0.05. Use 
this information to answer Questions 19.43 to 19.45. 


19.43 


19.44 


19.45 


19.46 


19.47 


The probability that the randomly selected student is over 25 years old or local is 


(a) 0.950. 
(b) 0.900. 
(c) 0.850. 
(d) 0.750. 


The probability that the randomly selected student is over 25 years old and not 
local is 


(a) 0.750. 
(b) 0.650. 
(c) 0.600. 
(d) 0.550. 


The conditional probability that the randomly selected student is over 25 years 
old given that they are not local is 


(a) 0.93, 

(b) 0.87. 

(c) 0.81. 

(d) 0.75. 

(Optional topic) Alysha makes 40% of her free throws. She takes five free 
throws in a game. If the shots are independent of each other, the probability that 
she misses the first two shots but makes the other three is about 


(a) 0.230. 

(b) 0.115. 

(c) 0.023. 

(d) 0.600. 

(Optional topic) Alysha makes 40% of her free throws. She takes five free 
throws in a game. If the shots are independent of each other, the probability that 
she makes exactly one of five shots is about 


(a) 0.259. 
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(Optional topic) Have you visited your dentist this year? Polls find that 62% of 
American aged 18 to 29 report having seen a dentist in the last 12 months. This percentage varies 
little among age groups, with the most pronounced differences in dental habits occurring across 
income groups.° Use this information to answer Questions 19.48 and 19.49. 


19.48 If you take an SRS of 1000 Americans aged 18 to 29, the approximate distribu- 
tion of the number of individuals in your sample who would report having seen a 
dentist in the last 12 months is 

(a) N(0.62, 15.35) 

(b) N(0.62, 236) 

(c) N(620, 15.35) 

(d) N(620, 236) 

If you take an SRS of 1000 Americans aged 18 to 29, find the approximate prob- 
ability that fewer than 600 individuals in your sample would report having seen a 
dentist in the last 12 months. 


19.49 


Supplementary exercises apply the skills you have learned in ways 
that require more thought or more elaborate use of technology. 


19.50 


19.51 


The addition rule. The addition rule for probabili- 
ties, P(A or B) = P(A) + P(B), is not always true. 
Give (in words) an example of real-world events A 
and B for which this rule is not true. 


Comparing wine tasters. Two wine tasters rate 
each wine they taste on a scale of 1 to 5. From data 
on their ratings of a large number of wines, we obtain 
the following probabilities for both tasters’ ratings of 
a randomly chosen wine: 


eight faces are triangles, which we will call 1, 2, 3, 
4, 5, 6, 7, and 8. Each of the squares is equally likely. 
Each of the triangles is also equally likely, but the tri- 
angle probability differs from the square probability. 
The probability of getting a triangle is 0.28. Give the 
probability model for the 14 possible outcomes. 


TASTER 2 
TASTER 1 1 2 3 4 5 : 
1 0.05 0.02 0.01 0.00 0.00 i : 
2 0.02 0.08 0.04 0.02 0.01 
5 0.01 0.04 0.25 0.05 0.01 19.53 Distributions: means versus individuals. The z 
4 0.00 0.02 0.05 0.18 0.02 confidence interval and test are based on the sam- 
5 Bor 0.01 0.01 0.02 0.08 pling distribution of the sample mean x. The Na- 
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(a) Why is this a legitimate discrete probability model? 
(b) What is the probability that the tasters agree 
when rating a wine? 

(c) What is the probability that Taster 1 rates a wine 
higher than Taster 2? What is the probability that 
Taster 2 rates a wine higher than Taster 1? 


A 14-sided die. An ancient Korean drinking game 
involves a 14-sided die. The players roll the die in 
turn and must submit to whatever humiliation is writ- 
ten on the up-face: something like “Keep still when 
tickled on face.” Six of the 14 faces are squares. Let’s 
call them A, B, C, D, E, and F for short. The other 


19.54 


tional Survey of Student Engagement (NSSE) asks 
college seniors to rate how much their experience 
at their institution has contributed to their ability 
to think critically and analytically. Ratings are on a 
scale of 1 to 7, with 7 being the highest (best) rat- 
ing. Suppose scores for all seniors in 2013 are Normal 
with mean = 3.3 and standard deviation o = 0.8. 
(a) You take an SRS of 100 seniors. According to 
the 99.7 part of the 68-95-99.7 rule, about what 
range of ratings do you expect to see in your sample? 
(b) You look at many SRSs of size 100. About what 
range of sample mean ratings x do you expect to see? 


Distributions: larger samples. In the setting of 
the previous exercise, how many seniors must you 


19.55 


19.56 


19.57 


sample to cut the range of values of x in half? This 19.58 
will also cut the margin of error of a confidence in- 
terval for w in half. Do you expect the range of indi- 
vidual scores in the new sample to also be much less 


than in a sample of size 100? Why? 


Normal body temperature? Here are the daily 19.59 


average body temperatures (degrees Fahrenheit) for 


20 healthy adults:!° am BODYTMP 


98.83 
97.42 
97.88 
98.33 


96.80 
97.30 
98.54 


98.12 
97.84 
98.33 


98.09 
97.90 
97.48 


98.74 
97.87 
99.64 
98.92 


97.89 
100.27 
97.87 


(a) Make a stemplot of the data. The distribution asc? 


is roughly symmetric and single-peaked. There 

is one mild outlier. We expect the distribution of 
the sample mean x to be close to Normal. 

(b) Do these data give evidence that the mean body 
temperature for all healthy adults is not equal to the 
traditional 98.6 degrees? Follow the four-step pro- 
cess for significance tests (page 401). (Suppose that 
body temperature varies Normally with standard 
deviation 0.7 degree.) 


Time in a restaurant. The owner of a pizza res- 
taurant in France knows that the time customers 
spend in the restaurant on Saturday evening has 
mean 90 minutes and standard deviation 15 min- 
utes. He has read that pleasant odors can influence 
customers, so he spreads a lavender odor through- 
out the restaurant. Here are the times (minutes) 
for customers on the next Saturday evening:!! 


ely) RSTRNT 


92 126 114 
93 76 98 
129 103 107 109 94 
102 108 95 121 109 
116 88 109 97 101 


106 89 
108 124 


137 
105 
105 
104 


106 19.61 


(a) Make a stemplot of the times. The distribu- 
tion is roughly symmetric and single-peaked, so 
the distribution of x should be close to Normal. 


(b) Suppose that the standard deviation 0 = 15 
minutes is not changed by the odor. Is there reason 
to think that the lavender odor has increased the 
mean time customers spend in the restaurant? 
Follow the four-step process for significance tests 
(page 401). 


Normal body temperature. Use the data in Exer- 
cise 19.55 to estimate mean body temperature with 
90% confidence. Follow the four-step process for 
confidence intervals (page 381). Ey.) BODYTMP 
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Time in a restaurant. Use the data in Exercise 
19.56 to estimate the mean time customers spend in 
this restaurant on Saturday evenings with 95% con- 
fidence. Follow the four-step process for confidence 
intervals (page 381). Ely) RSTRNT 


Tests from confidence intervals. You read in a 
U.S. Census Bureau report that a 90% confidence 
interval for the median income in 2012 of American 
households was $51,017 + $343. Based on this 
interval, can you reject the null hypothesis that the 
median income in this group is $50,000? What is the 
alternative hypothesis of the test? What is its signifi- 
cance level? 


(Optional topic) Testing for HIV. Enzyme im- 
munoassay tests are used to screen blood specimens 
for the presence of antibodies to HIV, the virus that 
causes AIDS. Antibodies indicate the presence of the 
virus. The test is quite accurate but is not always cor- 
rect. Here are approximate probabilities of positive 
and negative test results when the blood tested does 
and does not actually contain antibodies to HIV:!” 


TEST RESULT 
POSITIVE NEGATIVE 
Antibodies present 0.9985 0.0015 
Antibodies absent 0.0060 0.9940 


Suppose that 1% of a large population carries anti- 
bodies to HIV in their blood. 


(a) Draw a tree diagram for selecting a person from 
this population (outcomes: antibodies present or 
absent) and testing his or her blood (outcomes: test 
positive or negative). 

(b) What is the probability that the test is positive for 
a randomly chosen person from this population? 


(Optional topic) Type of high school attended. 
Choose a college freshman at random and ask what 
type of high school they attended. Here is the distri- 
bution of results:! 


Regular Public Public 
Type Public Charter Magnet 
Probability 0.758 0.029 0.035 

Private Private Home 
Type Religious Independent school 
Probability 0.109 0.063 0.006 


What is the conditional probability that a college 
freshman was home schooled, given that he or she 
did not attend a regular public high school? 
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(Optional topic) False HIV positives. Continue 
your work from Exercise 19.60. What is the prob- 
ability that a person has the antibody, given that 
the test is positive? (Your result illustrates a fact 
that is important when considering proposals for 
widespread testing for HIV, illegal drugs, or agents 
of biological warfare: if the condition being tested is 
uncommon in the population, most positives will be 
false positives.) 


(Optional topic) On the web. What kinds of web- 
sites do males aged 18 to 34 visit? About 50% of male 
Internet users in this age group visit an auction site 
such as eBay at least once a month." 


(a) If we interview a random sample of 12 male 
Internet users aged 18 to 34, what is the probability 
that exactly eight of the 12 have visited an auction 
site in the past month? 

(b) Suppose that we had interviewed a random 
sample of 500 men aged 18 to 34. What is the prob- 
ability that at least 235 of the men in the sample visit 
an online auction site at least once a month? (Check 


19.64 


19.65 


that the Normal approximation is permissible and 
use it to find this probability.) 


(Optional topic) Low power? It appears that 
eating oat bran lowers cholesterol slightly. At a time 
when oat bran was something of a fad, a paper in the 
New England Journal of Medicine found that it had 
no significant effect on cholesterol. The paper re- 
ported a study with just 20 subjects. Letters to the 
journal denounced publication of a negative finding 
from a study with very low power. Explain why lack 
of significance in a study with low power gives no 
reason to accept the null hypothesis that oat bran 
has no effect. 


(Optional topic) Type | and Type Il errors. 
Exercise 19.37 asks for a significance test of the null 
hypothesis that the mean IQ of very-low-birth-weight 
male babies is 100 against the alternative hypothesis 
that the mean is less than 100. State in words what 
it means to make a Type I error and a Type II error 
in this setting. 
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Inference about 
Variables 


ith the principles in hand, we proceed to practice, that is, to infer- 
ence in fully realistic settings. In the remaining chapters of this 
book, you will meet many of the most commonly used statistical 
procedures. We have grouped these procedures into two classes, 
corresponding to our division of data analysis into exploring variables and 
distributions and exploring relationships. The five chapters of Part IV concern 
inference about the distribution of a single variable and inference for comparing 
the distributions of two variables. Part V deals with inference for relationships 
among variables. In Chapters 20 and 21, we analyze data on quantitative vari- 
ables. We begin with the familiar Normal distribution for a quantitative variable. 
Chapters 22 and 23 concern categorical variables, so that inference begins with 
counts and proportions of outcomes. Chapter 24 reviews this part of the text. 
The four-step process for approaching a statistical problem can guide much 
of your work in these chapters. You should review the outlines of the four 
step process for a confidence interval (page 381) and for a test of significance 
(page 401). The statement of an exercise usually does the State step for you, 
leaving the Plan, Solve, and Conclude steps for you to complete. It is helpful 
to first summarize the State step in your own words to organize your thinking. 
Many examples and exercises in these chapters involve both carrying out infer- 
ence and thinking about inference in practice. Remember that any inference 
method is useful only under certain conditions, and that you must judge these 
conditions before rushing to inference. 


PART 


QUANTITATIVE 
RESPONSE 
VARIABLE 


CHAPTER 20 Inference 
about a Population Mean 


CHAPTER 21 Comparing 
Two Means 


CATEGORICAL 
RESPONSE 
VARIABLE 


CHAPTER 22 Inference 
about a Population Proportion 


CHAPTER 23 Comparing 
Two Proportions 


CHAPTER 24 Inference 
about Variables: Part IV Review 
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nterence about a 
Population Mean 


his chapter describes confidence intervals and significance tests for the mean 

pb of a population. We used the z procedures in this same setting to introduce 

the ideas of confidence intervals and tests. Now we discard the unrealistic 
condition that we know the population standard deviation o and present proce- 
dures for practical use. We also pay more attention to the real-data setting of our 
work. The details of confidence intervals and tests change only slightly when you 
don’t know o. More important, you can interpret your results exactly as before. To 
illustrate this, Example 20.2 repeats an example from Chapter 16. 


Confidence intervals and tests of significance for the mean pw of a Normal popu- 
lation are based on the sample mean x. Confidence intervals and P-values involve 
probabilities calculated from the sampling distribution of x. Here are the conditions 
needed for realistic inference about a population mean. 


CFAPTER 


In this chapter 
we cover... 


20.1 


20.2 
20.3 


20.4 
20.5 
20.6 
20.7 
20.8 


Conditions for inference 
about a mean 


The ft distributions 


The one-sample tf confidence 
interval 


The one-sample ft test 
Using technology 

Matched pairs ¢ procedures 
Robustness of t procedures 


Resampling and standard 
errors* 
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Conditions for Inference about a Mean 


We can regard our data as a simple random sample (SRS) from the population. 
This condition is very important. 

Observations from the population have a Normal distribution with mean p and 
standard deviation o. In practice, it is enough that the distribution be symmetric and 
single-peaked unless the sample is very small. Both 4 and o are unknown parameters. 


There is another condition that applies to all the inference methods in this book: 
the population must be much larger than the sample—say, at least 20 times as 
Biv All our examples and exercises satisfy this condition. Practical set- 
tings in which the sample is a large part of the population are rather special, 

and we will not discuss them. 

When the conditions for inference are satisfied, the sample mean x has the 
Normal distribution with mean yw and standard deviation @/Vn. Because we don’t 
know a, we estimate it by the sample standard deviation s. We then estimate the 
standard deviation of x by s/Vn. This quantity is called the standard error of the 
sample mean x. 


Standard Error 


When the standard deviation of a statistic is estimated from data, the result is called 
the standard error of the statistic. The standard error of the sample mean x is s/Vn. 


Apply Your Knowledge 


20.1 Travel Time to Work. A study of commuting times reports the travel times 
to work of a random sample of 1000 employed adults in Chicago.? The mean 
is x = 40.0 minutes and the standard deviation is s = 56.9 minutes. What is 
the standard error of the mean? 


20.2 Comparing Breathing Frequencies in Swimming. Researchers from 
the United Kingdom studied the effect of two breathing frequencies on per- 
formance times and on several physiological parameters in front crawl swim- 
ming. The breathing frequencies were one breath every second stroke (B2) 
and one breath every fourth stroke (B4). Subjects were 10 male collegiate 
swimmers. Each subject swam 200 meters using each breathing frequency: 
once with breathing frequency B2, and once on a different day with breath- 
ing frequency B4. A paper states that the results are expressed as mean plus 
or minus the standard deviation.’ One result reported in the paper states 
that the immediate postexercise heart rate for subjects when using breathing 
frequency B2 was 163 + 15 beats per minute. What are x and the standard 
error of the mean for these subjects? (This exercise is also a warning to read 
carefully: that 163 + 15 is not a confidence interval, yet summaries in this 
form are common in scientific reports.) 


If we knew the value of a, we would base confidence intervals and tests for w 
on the one-sample z statistic 
Xm 
z — 
a/Vn 
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This x statistic has the standard Normal distribution N(0, 1). In practice, we don’t 
know a, so we substitute the standard error s/Vn of X for its standard deviation 
a/Vn. The statistic that results does not have a Normal distribution. It has a dis- 
tribution that is new to us, called a t distribution. 


The One-Sample t Statistic and the t Distributions 


Draw an SRS of size n from a large population that has the Normal distribution with 
mean p and standard deviation 0. The one-sample t statistic 
ee Se 

s/Vn 


has the t distribution with n — 1 degrees of freedom. 


t 


The t statistic has the same interpretation as any standardized statistic: it says 
how far x is from its mean p in standard error units. There is a different t distribu- 
tion for each sample size. We specify a particular t distribution by giving its degrees © degrees of freedom 
of freedom. The degrees of freedom for the one-sample t statistic come from the 
sample standard deviation s in the denominator of t. We saw in Chapter 2 (page 58) 
that s has n — 1 degrees of freedom. There are other ¢ statistics with different 
degrees of freedom, some of which we will meet later. We will write the t distribu- 
tion with n — 1 degrees of freedom as t(n — 1) for short. 

Figure 20.1 compares the density curves of the standard Normal distribution and 
the t distributions with two and nine degrees of freedom. The figure illustrates these 
facts about the t distributions: 


= The density curves of the t distributions are similar in shape to the standard 
Normal curve. They are symmetric about zero, single-peaked, and bell-shaped. 
The variability of the t distributions is a bit greater than that of the standard 
Normal distribution. The t distributions in Figure 20.1 have more probability 
in the tails and less in the center than does the standard Normal. This is true 
because substituting the estimate s for the fixed parameter o introduces more 
variation into the statistic. 


|_| 


= As the degrees of freedom increase, the t density curve approaches the 
N(O, 1) curve ever more closely. This happens because s estimates @ more 
accurately as the sample size increases. So using s in place of o causes little 
extra variation in the statistic when the sample is large. 


t distributions have more --  t,2 degrees of freedom 
areainthetailsthanthe | /* Lt %\ t, 9 degrees of freedom 
standard Normal distribution. standard Normal 


FIGURE 20.1 

Density curves for the t distributions 

with two and nine degrees of 

:. freedom and for the standard Normal 
Nel distribution. All are symmetric with 

(0) center zero. The t distributions are 

somewhat more variable. 
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Table C in the back of the book gives critical values for the t distributions. Each 
row in the table contains critical values for the t distribution whose degrees of free- 
dom appear at the left of the row. For convenience, we label the table entries both 
by the confidence level C (in percent) required for confidence intervals and by the 
one-sided and two-sided P-values for each critical value. You have already used the 
standard Normal critical values in the z* row at the bottom of Table C. By looking 
down any column, you can check that the t critical values approach the Normal 


values as the degrees of freedom increase. If you use statistical software, you don’t 
need Table C. 


Figure 20.1 shows the density curve for the t distribution with nine degrees of free- 
dom. What point on this distribution has probability 0.05 to its right? In Table C, look 
in the df = 9 row above one-sided P-value 0.05 and you will find that this critical 
value is t* = 1.833. To use software, enter the degrees of freedom and the prob- 
ability you want to the /eft, 0.95 in this case. Here is Minitab’s output: 


Student’s t distribution with 9 DF 
P( X <= x ) x 
0.95 1.83311 


For the standard Normal distribution, the point that has probability 0.05 to the right 
is 1.645 (see the z* row at the bottom of Table C for the Normal distribution critical 
values). The value for the standard Normal distribution is smaller than the value for 
the t distribution. This is an example of what we mean by the statement that t distri- 
butions have more probability in the tails than the standard Normal. = 


() LaunchPad Online Resources 


* The StatBoards video, How to Find t Critical Values, discusses several exam- 
ples of finding t critical values. 


Apply Your Knowledge 


20.3 Critical Values. Use software or Table C to find 
(a) the critical value for a one-sided test with level a = 0.05 based on the 
t(3) distribution. 
(b) the critical value for a 98% confidence interval based on the t(25) 
distribution. 
20.4 More Critical Values. You have an SRS of size 30 and calculate the one- 
sample t statistic. What is the critical value t* such that 
(a) thas probability 0.025 to the right of t*? 
(b) thas probability 0.75 to the left of t*? 


To analyze samples from Normal populations with unknown a, just replace 
the standard deviation o/Vn of X by its standard error s/Vn in the z proce- 
dures of Chapters 16, 17, and 18. The confidence interval and test that result are 


20.3 The One-Sample t Confidence Interval 


one-sample t procedures. Critical values and P-values come from the t distribution 
with n — 1 degrees of freedom. The one-sample t procedures are similar in both 
reasoning and computational detail to the z procedures. 


The One-Sample t Confidence Interval 


Draw an SRS of size n from a large population having unknown mean p. A level C 
confidence interval for p is 


mR ae (i 


Sl° 


where t* is the critical value for the t(n — 1) density curve with area C between 
—t* and t*. This interval is exact when the population distribution is Normal and is 
approximately correct for large n in other cases. 


We will provide guidelines on how large a sample size is needed for the t confi- 
dence interval to be approximately correct in Section 20.7. 


Let’s look again at the study of tipping in a restaurant that we met in Example 16.3. 
We follow the four-step process for a confidence interval, outlined on page 381. 


(STATE? Does the expectation of good weather lead to more generous behavior? 
Psychologists studied the size of the tip in a restaurant when a message indicating 
that the next day’s weather would be good was written on the bill. Here are tips from 
20 patrons, measured in percent of the total bill:* 


20.8 18.7 19.9 20.6 21.9 23.4 22.8 24.9 22.2 20.3 
24.9 22.3 27.0 204 22.2 24.0 21.1 22.1 22.0 22.7 


This is one of three sets of measurements made, the others being tips received when 
the message on the bill said that the next day’s weather would not be good or there 
was no message on the bill. We want to estimate the mean tip for comparison with 
tips under the other conditions. 


PLAN: We will estimate the mean percentage tip yw for all patrons of this restaurant 
when they receive a message on their bill indicating that the next day’s weather will 
be good by giving a 95% confidence interval. 


ESRI We must first check the conditions for inference. 
= As in Chapter 16 (page 381), we are willing to regard these patrons as an SRS 
from all patrons of this restaurant. 
= The stemplot in Figure 20.2 does not suggest any strong departures from Normality. 
We can proceed to calculation. For these data, 
X=22.21 and s= 1.963 


The degrees of freedom are n — 1 = 19. From Table C, we find that for 95% con- 
fidence t* = 2.093. The confidence interval is 


z+ La 2221 + 2.093128 
Vn V20 
= 22.21 + 0.92 


= 21.29 to 23.13 percent 


| CONCLUDE: | We are 95% confident that the mean percentage tip for all patrons of 
this restaurant when their bill contains a message that the next day’s weather will be 
good is between 21.29 and 23.13. m 
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HQ Better 
Lik STATISTICS, 
BETTER BEER 


The t distribution and the t inference 
procedures were invented by William 
S. Gosset (1876-1937). Gosset 
worked for the Guinness brewery, 

and his goal in life was to make better 
beer. He used his new t procedures 

to find the best varieties of barley 

and hops. Gosset’s statistical work 
helped him become head brewer, a 
more interesting title than professor 
of statistics. Because Gosset published 
under the pen name “Student” you 
will often see the t distribution called 
“Student's t” in his honor. 


© Ariel Skelley/age fotostock 
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FIGURE 20.2 
Stemplot of the percentage tips, for 
Example 20.2. 
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Our work in Example 20.2 is very similar to what we did in Example 16.3 
(page 381). To make the inference realistic, we replaced the assumed o = 2 by 
s = 1.963 calculated from the data, and replaced the standard Normal critical value 
z* = 1.960 by the t critical value t* = 2.093. The resulting confidence interval is 
slightly wider than the one obtained in Example 16.3 (which was 21.33 to 23.09). 

The one-sample t confidence interval has the form 


estimate + t*SE imate 


where “SE” stands for “standard error.” We will meet a number of confidence in- 
tervals that have this common form. In Example 20.2, the estimate is the sample 
mean x, and its standard error is 


Software will find x, s, SE;, and the confidence interval from the data. Figure 20.5 
(page 465) displays typical software output for Example 20.2. 


() LaunchPad Online Resources 


* The StatBoards video, Calculating a One-Sample t Confidence Interval, 
discusses examples of constructing confidence intervals. 


Apply Your Knowledge 


20.5 Critical Values. What critical value t* from Table C would you use for a 
confidence interval for the mean of the population in each of the following 
situations? (If you have access to software, you can use software to determine 
the critical values.) 


(a) A 95% confidence interval based on n = 12 observations. 
(b) A 99% confidence interval from an SRS of two observations. 


(c) A 90% confidence interval from a sample of size 1001. 


20.6 How Much Will | Bet? Our decisions depend on how the options are pre- 
sented to us. Here’s an experiment that illustrates this phenomenon. Tell 20 
subjects that they have been given $50 but can’t keep it all. Then present 
them with a long series of choices among bets they can make with the $50. 
Scattered among these choices in random order are 64 choices that ask the 
subject to choose between betting a fixed amount and an all-or-nothing 
gamble. The odds for all the bets are the same, but in 32 of the choices, 
the fixed option reads “Keep $20,” and in the other 32 choices, the fixed 
option reads “Lose $30.” These two fixed options lead to exactly the same 
outcome, but people are more likely to choose the fixed option that says they 
lose money. Here are the percent differences (“Number of times chose ‘Lose 
$30’” minus “Number of times chose ‘Keep $20’” divided by the number of 
trials on which the 20 subjects chose the fixed-option gamble rather than 
the all-or-nothing bet).° Ely) GAMB1 


37.9 30.8 6.2 126° 143° 8:3 16.7 200° J05 21:7 
308 "273" 22 385" 783 105 “8.3 10 8 25.0F 7 
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sy) 
~ 


Make a stemplot. Is there any sign of a major deviation from Normality? 


=> 
op 
= 


All 20 subjects gambled a fixed amount more often when faced with 
a sure loss than when faced with a sure win. Give a 95% confidence 
interval for the mean percent increase in gambling a fixed amount 
when faced with a sure loss. 


20.7 She Sounds Tall! Presented with recordings of a pair of people of the same 
sex speaking the same phrase, can a listener determine which speaker is taller 42 
simply from the sound of their voice? Twenty-four young adults at Washing- 
ton University listened to 100 pairs of speakers, and within each pair were 
asked to indicate which of the two speakers was the taller. Here are the num- 
ber correct (out of 100) for each of the 24 participants:° £jy,) TALL 


65 61 67 59 58 62 56 67 61 67 63 53 
(ish 48) Go Sis) GY) 70) 5 sto Gs Sy sis} 70 


Assume that these young adults can be regarded as an SRS of all young adults 
in the U.S. Use a 99% confidence interval to estimate the mean number 
correct in the population of all young adults in the U.S. Follow the four-step 
process as illustrated in Example 20.2. 


Like the confidence interval, the t test is very similar to the x test we met earlier. 


The One-Sample t Test 


Draw an SRS of size n from a large population having unknown mean wp. To test the 
hypothesis H,: wu = fo, compute the one-sample t statistic 


_ XT Mo 
— s/Vn 


In terms of a variable T having the t(n — 1) distribution, the P-value for a test of Hp against 


t 


lee fi & fi, is IA 2 a) 


Hy ow A obo is 2P(T = |t]) 


18 
Ha bh < Mo is P(T =t) _ oe 
t 
It] 


These P-values are exact if the population distribution is Normal and are approxi- 
mately correct for large n in other cases. 
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We will provide guidelines on how large a sample size is needed for the P-values 
to be approximately correct in Section 20.7. 


45 We follow the four-step process for a significance test, outlined on page 401. 


(STATE?) To investigate water quality, on August 8, 2010, the Columbus Dispatch took 
water samples at 20 Ohio State Park swimming areas. Those samples were taken to 
laboratories and tested for fecal coliform, which are E. coli bacteria found in human 
and animal feces. An unsafe level of fecal coliform means there's a higher chance that 
disease-causing bacteria are present and more risk that a swimmer will become ill. 
Ohio considers it unsafe if a 100-milliliter sample (about 3.3 ounces) of water con- 
tains more than 400 coliform bacteria. 


I; Experts caution that the tests are a snapshot of the quality of the water at 
‘the time they were taken. Fecal coliform levels can change as weather and other 
conditions change. So we ask the question in terms of the mean fecal coliform level 
pb for all these swimming areas. The null hypothesis is “level is not unsafe,” and the 
alternative hypothesis is “level is unsafe.” 


Ho: bp = 400 
Ha: 2 > 400 


These swimming areas had been deemed safe in the past and the reason data 
were collected was to determine if this continued to be true. So we are looking for 
evidence that past conditions have changed. If these areas had been found to be 
unsafe in the past, and we were seeking evidence that they are now safe, our null and 
alternative hypotheses would be Hy: w = 400 and H,: w < 400. 


ESRB Here are the fecal coliform levels found by the laboratories:’ 


160 40 2800 80 2000 2000 1500 400 150 £500 
3000 2200 15 80 2000 2000 2600 600 1000 1500 


Are these data good evidence that, on average, the fecal coliform levels in these 
swimming areas were unsafe? 

First, check the conditions for inference. We are willing to regard these particular 
20 samples as an SRS from a large population of possible samples. Figure 20.3 is 
a histogram of the data. We can’t accurately judge Normality from 20 observations; 


Number of samples 


FIGURE 20.3 
Histogram of the fecal coliform level 
in Example 20.3. 


0 500 1000 1500 2000 2500 3000 3500 
Fecal coliform bacteria per 100 mL. 


20.4 The One-Sample t Test 


there are no outliers but the distribution of fecal coliform levels is somewhat skewed. 
P-values for the t test may be only approximately accurate. 
The basic statistics are 


X=1231 and s=1038 


The one-sample t statistic is 
X— Mo 1231 — 400 
~ s/Vn 1038/20 
= 3.580 


t 


The P-value for t = 3.580 is the area to the right of 3.580 under the t distribution 
curve with degrees of freedom n — 1 = 19. Figure 20.4 shows this area. Software 
(see Figure 20.6) tells us that P = 0.001. 

Without software, we can pin P between two values by using Table C. Search the 
df = 19 row of Table C for entries that bracket t = 3.580. 

The observed t lies between the critical values for one-sided P-values 0.001 and 
0.0005. 


QSOS There is quite strong evidence (P < 0.0025) that, on average, fecal 
coliform levels in these Ohio State Park swimming areas are unsafe. m 


t distribution with 
19 degrees of freedom 


P-value = 0.001 


+m Values of t ———» 


t = 3.580 


df = 19 
* 3.579 3.883 


One-sided P 0.001 0.0005 


FIGURE 20.4 
The P-value for the one-sided t test in 
Example 20.3. 


(3) LaunchPad Online Resources 


* The StatBoards video, Carrying Out a One-Sample t Test, discusses examples 
of testing. 


Apply Your Knowledge 


20.8 Is It Significant? The one-sample t statistic for testing 
Ho: ae 0 
Joke ji, = © 


from a sample of n = 25 observations has the value t = 1.75. 
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(a) What are the degrees of freedom for this statistic? 


(b) Give the two critical values t* from Table C that bracket t. What are 
the one-sided P-values for these two entries? 

(c) Is the value t = 1.75 significant at the 10% level? Is it significant at 
the 5% level? Is it significant at the 1% level? 

(d) (Optional) If you have access to suitable technology, give the exact 
one-sided P-value for t = 1.75. 


20.9 Is It Significant? The one-sample t statistic from a sample of n = 5 obser- 
vations for the two-sided test of 


Ho: w= 50 
Hy: pw # 50 
has the value t = 2.50. 


(a) What are the degrees of freedom for t? 

(b) Locate the two critical values t* from Table C that bracket t. What 
are the two-sided P-values for these two entries? 

(c) Is the value t = 2.50 statistically significant at the 10% level? At the 
5% level? At the 1% level? 

(d) (Optional) If you have access to suitable technology, give the exact 
two-sided P-value for t = 2.50. 


20.10 She Sounds Tall, Continued. Do the data of Exercise 20.7 give good 
reason to think that the mean number of correct identifications in the 
population of all young adults in the U.S. is greater than 50 (the expected 
number correct if one is just guessing)? Carry out a test of significance, 
following the four-step process as illustrated in Example 20.3. TALL 


Any technology suitable for statistics will implement the one-sample t pro- 
cedures. As usual, you can read and use almost any output now that you know 
what to look for. Figure 20.5 displays output for the 95% confidence interval of 
Example 20.2 from a graphing calculator, two statistical programs, a spreadsheet 
program, and the Crunchlt! software package. The calculator, Minitab, JMP, 
and CrunchIt! outputs are straightforward. All three give the estimate x and 
the confidence interval plus a clearly labeled selection of other information. The 
confidence interval agrees with our hand calculation in Example 20.2. In general, 
software results are more accurate because of the rounding in hand calculations. 
Excel gives several descriptive measures but does not give the confidence interval. 
The entry labeled “Confidence Level (95.0%)” is the margin of error. You can use 
this together with x to get the interval using either a calculator or the spread- 
sheet’s formula capability. 

Figure 20.6 displays output for the t test in Example 20.3. The graphing calcula- 
tor, Minitab, JMP, and Crunchlt! give the sample mean x, the t statistic, and its 
P-value. Accurate P-values are the biggest advantage of software for the t proce- 
dures. Excel is, as usual, more awkward than software designed for statistics. It lacks 
a one-sample t test menu selection but does have a function named TDIST for tail 
areas under t density curves. The Excel output shows functions for the t statistic and 
its P-value to the right of the main display, along with their values t = 3.582967678 
and P = 0.00099192. 


20.5 Using Technology 


Texas Instruments Graphing Calculator FIGURE 20. 
The t confidence interval for Example 20.2: 
T : rit. 32a 1 a ae output from a graphing calculator, three 
hoe #1 eo bi a ~126 statistical programs, and a spreadsheet 
AG program. 
A 


One-Sample T: Percent 


Variable W Mean StDevy SE Mean 95% CI 
Percent 20 22.210 1.963 0.439 (21.292, 23.128)| 


JMP 
\¥| Confidence Intervals 

Parameter Estimate Lower Cl Upper Cl 1-Alpha 
Mean 22.21 2129151 2312849 +~—s-0.950 
Excel 


This is an estimate of uw. 


Microsoft Excel = E, 


a os ee ee 
a | a Se 
eG a ee 
Iscandard Berar [| oases] | |_| 
median SSCP tsp Sid Sd Cid 
iSeandee@ pevintion [| _____ 4.96] J |__| 
[gaupis weriance [1 __3.e5] _] | |. 
range SSCP sof SOiSSCidCOSC~*idS 
hiinimun Sd eof i Oi Cid 
faxcimam ir} Or Cd Cd 


This is the margin of 
error £t* SE. 


Crunchlt! 


Results - t 1-Sample -|x 


Sample Mean: 22.21 


Standard Error: 0.4388 
df: 19 
95% Confint: (21.29, 23.13) 
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Texas Instruments Graphing Calculator 
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Minitab 


One-Sample T: Fecal coliform per 100 ml 
Test of mu = 400 vs > 400 
95% 
Lower 


Variable N Mean StDev SE Mean Bound T P 
fecal coliform 20 1231 1038 232 630 3.58 0.001 


I | 


JMP 
(~) Test Mean 


400 
1231 725 
ils} 
1037.54 


-500 0 500 1000 1500 


Microsoft Excel 


fecal coliform per 100 ml Pee ees 
1231.2 
232.0004) 
1250) 0 
Pe ee 
Sample Variance [1 _—— 
[as a 

Minimum es 
as ee 

Ey 

= = 
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This is the t statistic. 


Standard Error 


This is the P-value. 


Standard Deviation 


Confidence Level (95% 


FIGURE 20.6 
The t test for Example 20.3: output from a graphing calculator, three statistical programs, and a spreadsheet program. 
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Crunchlt! FIGURE 20.6 
(Continued) 
Results - t 1-Sample ax 
Expor + 


Null hypothesis: Population mean = 400 


Alternative hypothesis: Population mean > 400 


n: 20 
Sample Mean: 1231 
Standard Error: 232.0 

df: 19 

t statistic: 3.583 
P-value: 0.0009919 


Often the goal of an investigation is to demonstrate that a treatment causes an 
observed effect. In Chapter 9, we learned that randomized comparative studies are 
more convincing than single-sample investigations for demonstrating causation. For 
that reason, one-sample inference is less common than comparative inference. One 
common design to compare two treatments makes use of one-sample procedures. 
Matched pairs designs were discussed in Chapter 9. In a matched pairs design, = matched pairs design 
subjects are matched in pairs and each treatment is given to one subject in each 
pair. Another situation calling for matched pairs is before-and-after observations on 
the same subjects. 


Matched Pairs t Procedures 


To compare the responses to the two treatments in a matched pairs design, find the 
difference between the responses within each pair. Then apply the one-sample t pro- 
cedures to these differences. 


The parameter yz in a matched pairs t procedure is the mean of the differences in 
the responses to the two treatments within matched pairs of subjects in the entire 
population. 


(STATE! Humans often collaborate to solve problems. Will chimpanzees recruit another 
chimp when solving a problem requires collaboration? Researchers presented chim- 
panzee subjects with food outside their cage that they could bring within reach by 
pulling two ropes, one attached to each end of the food tray. If a chimp pulled only 
one rope, the rope came loose and the food was lost. Another chimp was available 
as a partner, but only if the subject unlocked a door joining two cages. (Chimpanzees 
learn these things quickly.) The same eight chimpanzee subjects faced this problem 
in two versions: the two ropes were close enough together that one chimp could pull 
both (no collaboration needed), or the two ropes were too far apart for one chimp to 


CHIMPS 
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Manoj Shah/Getty Images 
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Stemplot and dotplot of the differences, 
for Example 20.4. 


pull both (collaboration needed). Table 20.1 shows how often in 24 trials for each 
version each subject opened the door to recruit another chimp as partner.® Is there 
evidence that chimpanzees recruit partners more often when a problem requires 
collaboration? 


COLLABORATION NEEDED 


CHIMPANZEE YES NO DIFFERENCE 
Namuiska 16 0 16 
Kalema 16 il 15 
Okech 23 5 18 
Baluku 19 3 16 
Umugenzi 15 4 11 
Indi 20 9 11 
Bili 24 16 8 
Asega 24 20 4 


P _Take ps to be the mean difference (collaboration required minus not) in the 
number of times a subject recruited a partner. The null hypothesis says that the need 
for collaboration has no effect, and H, says that partners are recruited more often 
when the problem requires collaboration. So we test the hypotheses 


Ho: W = O 
H,: w > O 


ESQRVEA The subjects are “semi-free-ranging chimpanzees at Ngamba Island 
Chimpanzee Sanctuary in Uganda.” We are willing to regard them as an SRS from 
their species. To analyze the data, we examine the difference in the number of times 
a chimp recruited a partner, so subtract the “no collaboration needed” count from the 
“collaboration needed” count for each subject. The eight differences form a single 
sample from a population with unknown mean jp. They appear in the “Difference” 
column in Table 20.1. All the chimpanzees recruited a partner more often when the 
ropes were too far apart to be pulled by one chimp. 

The stemplot in Figure 20.7 creates the impression of a left-skewed distribution. 
This is a bit misleading, as the dotplot in the bottom part of Figure 20.7 shows. A 
dotplot simply places the observations on an axis, stacking observations that have the 
same value. It gives a good picture of distributions with only whole-number values. 
We know that observations that can take only whole-number values cannot come 
from a Normal population.? In practice, researchers are willing to treat such observa- 
tions as coming from a Normal population if there are more than just a few possible 
values and the distribution appears approximately Normal. Of course, we can’t assess 
approximate Normality from just eight observations, but there are no signs of major 
departures from Normality. The researchers used the matched pairs t test. 

The eight differences have 


X= 12.375 and s= 4.749 
The one-sample t statistic is therefore 


x-O 12,375-0 
s/Vn _ 4.749/V8 
= 737 


eS 


Find the P-value from the t(7) distribution. (Remember that the degrees of freedom 
are one less than the sample size.) Table C shows that 7.37 is greater than the critical 
value for one-sided P = 0.0005. The P-value is therefore less than 0.0005. Software 
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df =7 
* 4.785 5.408 
One-sided P | 0.001 0.0005 


says that P = 0.000077. 


| CONCLUDE: [ilitemertey give very strong evidence (P < 0.0005) that chimpanzees 

recruit a collaborator more often when faced with a problem that requires a collabora- 
tor to solve. That is, chimpanzees recognize when collaboration is necessary, a skill 
that they share with humans. 


Example 20.4 illustrates how to turn matched pairs data into single-sample data 
by taking differences within each pair. We are making inferences about a single 
population, the population of all differences within matched pairs. It is incorrect to 

ignore the matching and analyze the data as if we had two samples of chim- 

A panzees, one facing ropes close together and the other facing ropes far apart. 

Inference procedures for comparing two samples assume that the samples 

are selected independently of each other. This condition does not hold when the 

same subjects are measured twice. The proper analysis depends on the design used 
to produce the data. 


(4) LaunchPad Online Resources 


* The StatBoards video, Matched Pairs t Procedures, discusses examples of 
inference for matched pairs. 


Apply Your Knowledge 


Many exercises from this point on ask you to give the P-value of a t test. If you have suitable 
technology, give the exact P-value. Otherwise, use Table C to give two values between which 
P lies. 


20.11 Eye Grease. Athletes performing in bright sunlight often smear black eye 
grease under their eyes to reduce glare. Does eye grease work? In one study, 
16 student subjects took a test of sensitivity to contrast after three hours 
facing into bright sun, both with and without eye grease. (Greater sensitiv- 
ity to contrast improves vision and glare reduces sensitivity to contrast.) 
This is a matched pairs design. Here are the differences in sensitivity, with 
eye grease minus without eye grease:'° Ej) EYEGRs 


0.07 0.64 —-012 -0.05  —-0.18 0.14 —0.16 0.03 
0.05 0.02 0.43 O24 SON 0128 0.05 0.29 


We want to know whether eye grease increases sensitivity to contrast on the 
average. Do the data support this idea? Complete the Plan, Solve, and Con- 
clude steps of the four-step process, following the model of Example 20.4. 

20.12 Eye Grease, Continued. How much more sensitive to contrast are 
athletes with eye grease than without eye grease? Give a 99% confidence 
interval to answer this question. EYEGRS 


The t confidence interval and test are exactly correct when the distribution of 
the population is exactly Normal. No real data are exactly Normal. At best, the 
Normal distribution is an excellent approximation to the actual distribution of data 
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A) CATCHING 
WihM CHEATERS 


A certification test for surgeons 

asks 277 multiple-choice questions. 
Smith and Jones have 193 common 
right answers and 53 identical wrong 
choices. The computer flags their 
246 identical answers as evidence 

of possible cheating. They sue. The 
court wants to know how unlikely it 
is that exams this similar would occur 
just by chance. That is, the court 
wants a P-value. Statisticians offer 
several P-values based on different 
models for the exam-taking process. 
They all say that results this similar 
would almost never happen just by 
chance. Smith and Jones fail the 
exam. 


Inference about a Population Mean 


from real studies.'! The usefulness of the t procedures in practice therefore depends 
on how strongly they are affected by lack of Normality. 


Robust Procedures 


A confidence interval or significance test is called robust if the confidence level or 
P-value does not change very much when the conditions for use of the procedure are 
violated. 


The condition that the population is Normal rules out outliers, so the presence 
of outliers shows that this condition is not fulfilled. The t procedures are not ro- 
bust against outliers unless the sample is large, because x and s are not resistant 
to outliers. 

Fortunately, the t procedures are quite robust against non-Normality of the 
population except when outliers or strong skewness are present. (Skewness is more 
serious than other kinds of non-Normality.) As the size of the sample increases, the 
central limit theorem ensures that the distribution of the sample mean x becomes 
more nearly Normal, and that the t distribution becomes more accurate for critical 
values and P-values of the t procedures. 

Always make a plot to check for skewness and outliers before you use the t pro- 
cedures for small samples. For most purposes, you can safely use the one-sample t 
procedures when n = 15 unless an outlier or quite strong skewness is present. Here 
are practical guidelines for inference on a single mean.” 


Using the t Procedures 


Except in the case of small samples, the condition that the data are an SRS from 
the population of interest is more important than the condition that the population 
distribution is Normal. 


Sample size less than 15: Use t procedures if the data appear close to Normal 
(roughly symmetric, single peak, no outliers). If the data are clearly skewed or if 
outliers are present, do not use t. 


= Sample size at least 15: The t procedures can be used except in the presence of 
outliers or strong skewness. 


« Large samples: The t procedures can be used even for clearly skewed distributions 
when the sample is large, roughly n = 40. 


Figure 20.8 shows plots of several data sets. For which of these can we safely use 
the t procedures?'? 


» Figure 20.8(a) is a histogram of the percent of each state’s adult residents who 
are college graduates. We have data on the entire population of 50 states, so infer- 
ence is not needed. We can calculate the exact mean for the population. There is 

no uncertainty due to having only a sample from the population, and no need 
A for a confidence interval or test. /f these data were an SRS from a larger popu- 
lation, t inference would be safe despite the mild skewness because n = 50. 


= Figure 20.8(b) is a stemplot of the force required to pull apart 20 pieces of 
Douglas fir. The data are strongly skewed to the left with possible low outliers, so 
we cannot trust the t procedures for n = 20. 
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m Figure 20.8(c) is a stemplot of the lengths of 23 specimens of the red variety 
of the tropical flower Heliconia. The data are mildly skewed to the right and there 
are no outliers. We can use the t distributions for such data. 


» Figure 20.8(d) is a histogram of the heights of the female students in a college 
class. This distribution is quite symmetric and appears close to Normal. We can use 
the t procedures for any sample size. ™ 
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FIGURE 20.8 


Can we use t procedures for these data? (a) Percent of adult college graduates in the 50 states. No, this is an entire population, not a sample. 

(b) Force required to pull apart 20 pieces of Douglas fir. No, there are just 20 observations and strong skewness. (c) Lengths of 23 tropical flow- 
ers of the same variety. Yes, the sample is large enough to overcome the mild skewness. (d) Heights of female students. Yes, for any size sample, 
because the distribution is close to Normal. 


(3) LaunchPad Online Resources 


* The StatBoards video, When Can We Use t Procedures, discusses examples 
of checking assumptions and robustness. 
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Apply Your Knowledge 


© Eric Nathan/Alamy 


20.13 Diamonds. A group of earth scientists studied the small diamonds found 
in a nodule of rock carried up to the earth’s surface in surrounding rock. 
This is an opportunity to examine a sample from a single population of dia- 
monds formed in a single event deep in the earth.'* Table 20.2 presents data 
on the nitrogen content (parts per million) and the abundance of carbon 13 
in these diamonds. (Carbon has several isotopes, or forms with different 
numbers of neutrons in the nuclei of their atoms. Carbon 12 makes up 
almost 99% of natural carbon. The abundance of carbon 13 is measured by 
the ratio of carbon 13 to carbon 12, in parts per thousand more or less than 
a standard. The minus signs in the data mean that the ratio is smaller in 
these diamonds than in standard carbon.) am DMNDS 


DIAMOND NITROGEN (PPM) CARBON 13 RATIO | DIAMOND NITROGEN (PPM) CARBON 13 RATIO 
1 487 —2.78 13 273 =2:13 
2; 1430 1.38) 14 94 = 71.515) 
3 60 —4.26 15 69 —3.83 
4 244 II) 16 262 —2.04 
5 196 —2.12 17 120 —2.82 
6 274 ae Ol 18 302 O85 
7 41 —3.68 19 75 —3.57 
8 D4 Shy) 20 242 ae ae, 
9 473 —3.79 21 115 —3.89 
10 30 —4.06 2, 65 —shssif/ 
11 98 —1.83 23 311 —1.58 
12 41 405 24 61 soul 


We would like to estimate the mean abundance of both nitrogen and 
carbon 13 in the population of diamonds represented by this sample. 
Examine the data for nitrogen. Can we use at confidence interval for mean 
nitrogen? Explain your answer. Give a 90% confidence interval if you think 
the result can be trusted. 

20.14 Diamonds, Continued. Examine the data in Table 20.2 on abundance of 
carbon 13. Can we use at confidence interval for mean carbon 13? Explain 
your answer. Give a 90% confidence interval if you think the result can be 
trusted. 83] DMNDs 


=) 
= 


In Section 20.1, we defined the standard error of a statistic to be an estimate of 
the standard deviation of the statistic. For the sample mean, x, the standard devia- 
tion is o/Vn. A natural estimate is s/n, which is commonly referred to as the 
standard error of x in textbooks and computer output. 


*This material can be skipped if you prefer not to cover resampling methods. 
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We can also use resampling methods to estimate the standard deviation of 
many sample statistics, such as the sample mean or sample median. The basic idea 
is to take many SRSs from a population, compute the sample statistic for each, 
and then compute the standard deviation of these values. This is an estimate 
of the standard deviation of the sample statistic, sometimes referred to as the 
resampling estimate of the standard error. (Remember, the standard deviation of 
a sample statistic is the standard error. There is no need to divide the standard 
deviation of a sample statistic by Vn.) In practice, we typically do not know the 
exact distribution of the population. For example, as is the case in this chapter, 
we may know that the population is (approximately) Normal, but not know the 
mean pm and standard deviation o. For purposes of resampling, in such cases we 
estimate these unknown features of the population distribution from data. The 
next example demonstrates this process. 


In Example 17.3, the sweetness losses for a new cola, as measured by 10 trained 
tasters, were: 


20 04 07 20 -O04 2.2 —-1.3 1.2 1.1 2.3 


We assumed these values could be regarded as an SRS from a Normal population. 
The average sweetness loss is given by the sample mean x = 1.02. The sample 
standard deviation is s = 1.20. For purposes of resampling, we therefore assume 
the population distribution is Normal with mean mw = 1.02 and standard deviation 
o = 1.20. 

Now draw 1000 SRSs of size 10 from a Normal population with mean wp = 1.02 
and standard deviation @ = 1.20. (We used software to do this, and in practice, one 
uses software to apply resampling methods.) The results are displayed in Figure 20.9. 
The standard deviation of these 1000 sample means is found to be 0.3786. This is 
our resampling estimate of the standard error of x for an SRS of size 10. The formula 
in Section 20.1 tells us that the standard error is s/Vn = 1.20/V/10 = 0.3795. The 
values computed by resampling and by the formula are in close agreement. Ml 


Number of samples 


=l0) —073 =O -—O4' 02 00 O28 @ft 06 0.8 1.0 
Sample mean 


FIGURE 20.9 

A histogram, for Example 20.6, of the 
means of 1000 SRS&s of size 10 from 
a Normal population with mean w = O 
and standard deviation o = 1.20. 
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For the sample mean x we have a simple formula, s/n, for the standard error 
and there is no need to use resampling to estimate the standard error. However, 
what can we do when we don’t have a simple formula for the standard error of a 
statistic? For example, suppose we want to estimate the standard error of a sample 
median. The close agreement between the resampling estimate and the formula in 
Example 20.6 supports our claim that resampling can be used to estimate standard 
errors for statistics such as sample medians. The next example illustrates how to 


do this. 


For the data of Example 20.6, the sample median is 1.15. We have not been 
given a formula for the standard error of a sample median. Using the 1000 SRSs 
in Example 20.6, we can calculate the median of each sample and compute the 
standard deviation of these 1000 medians. We find that this standard deviation is 
0.442. This is our resampling estimate of the standard error of the sample median 
in this cola example. 


For a Normal population, the mean and median are the same. Thus, the sample 
median could be used to estimate the center of a Normal population. However, we 
prefer to use X to estimate the center. Example 20.7 helps us see why. The standard 
error of the sample median in Example 20.7 is larger than the standard error of the 
sample mean in Example 20.6. In other words, the sample mean produces a “more 
precise” estimate of the center (in the sense of small standard deviation) than the 
sample median. 

Resampling estimates of standard errors can be used to construct confidence 
intervals. However, there are several methods for generating resampling-based con- 
fidence intervals and these more advanced methods are beyond the scope of this 
book. 

We close with some cautions concerning resampling. First, we must resample in 
the same manner that we obtain our data. If our data are obtained by an SRS, we 
resample by taking repeated SRSs from the population distribution determined by 
the null hypothesis and/or estimates of any population parameters that determine 
the population distribution. 

Second, resampling provides an estimate of a standard error. Repeat the resam- 
pling and you will obtain a different estimate. The accuracy of the estimate is im- 
proved by taking a larger number of samples to estimate the sampling distribution. 

Third, we should always check assumptions, such as Normality, about the popu- 
lation distribution. In Examples 20.6 and 20.7, we assumed that the population dis- 
tribution was Normal. Do the data support this assumption? You will explore this 
in Exercise 20.44. 

Finally, resampling requires the use of software. Some software packages are 
more convenient for resampling than others. The online technology manuals and 
videos discuss how to do resampling for several packages. 


() LaunchPad Online Resources 


* The StatBoards video, Resampling and Standard Errors, discusses examples of 
using resampling to estimate standard errors. 
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20.15 Estimating a Standard Error. In Example 17.2, we saw that the sweet- 
ness losses for a cola currently on the market, as measured by 10 trained 
tasters, were 


hier (O4> O15 210) ik lil is) Ohi —@3) iz 


We assumed these values could be regarded as an SRS from a Normal popu- 
lation and saw that the average sweetness loss was x = 0.3. The sample 
standard deviation is s = 1.2. 


(a) Compute the standard error of x = 0.3 for these data. 


(b) Use the same process as in Example 20.6 to estimate the standard error. 
Does this agree with your answer in part (a)? 


20.16 Estimating a Standard Error. Use your 1000 SRSs in Exercise 20.15 to 
estimate the standard error of the sample median for the 10 scores. How 
does this standard error compare with that for x in Exercise 20.15? 


CHAPTER 20 SUMMARY 


Chapter Specifics 


m ‘Tests and confidence intervals for the mean fz of a Normal population are based on 
the sample mean x of an SRS. Because of the central limit theorem, the resulting pro- 
cedures are approximately correct for other population distributions when the sample 
is large. 


mu The standardized sample mean is the one-sample z statistic 


ie ae 
a/Vn 


If we knew o, we would use the z statistic and the standard Normal distribution. 


m In practice, we do not know a. Replace the standard deviation a/Vn of X by the 
standard error s/n to get the one-sample t statistic 


s/Vn 


The t statistic has the t distribution with n — 1 degrees of freedom. 


t 


m ‘There is a t distribution for every positive degrees of freedom. All are symmetric 
distributions similar in shape to the standard Normal distribution. The t distribution 
approaches the N(0, 1) distribution as the degrees of freedom increase. 


m A level C confidence interval for the mean jz of a Normal population is 


xt tt— 


= 


The critical value t* is chosen so that the t curve with n — 1 degrees of freedom has 
area C between —t* and t*. 


m Significance tests for Ho: uz = po are based on the t statistic. Use P-values or fixed 
significance levels from the t(n — 1) distribution. 


m Use these one-sample procedures to analyze matched pairs data by first taking the 
difference within each matched pair to produce a single sample. 
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m The t procedures are quite robust when the population is non-Normal, especially for 
larger sample sizes. The t procedures are useful for non-Normal data when n = 15 
unless the data show outliers or strong skewness. When n = 40, the t procedures can 
be used even for clearly skewed distributions. 


m Standard errors can be estimated using resampling. This is a useful method when deal- 
ing with statistics and parameters other than means. 


Link It 


In Chapters 16 to 18, we began our study of inference. We focused on inference for a 
population mean based on a sample from a Normal population. We included the unrealistic 
assumption that we knew the population standard deviation a. The reason was to make the 
underlying mathematics simpler. We were able to use what we learned about the Normal 
distribution in Chapters 3 and 12 and what we learned about the sampling distribution of 
the sample mean in Chapter 15 to construct confidence intervals for and conduct hypoth- 
esis tests about the population mean. Computing P-values and sample sizes was possible 
using what we had learned about Normal probability calculations. 

In this chapter, we continue our study of inference about a population mean based 
on a sample from a Normal population in the more realistic setting that we do not know 
o. The basic ideas of Chapters 16 to 18 still apply, but now we use the t distribution 
rather than the Normal distribution. Unfortunately, the mathematics associated with 
the t distribution is more complicated than that associated with the Normal distribution. 
We must rely on approximations from tables or statistical software to calculate P-values 
(and determining sample sizes is also more complicated when we use the t distribution). 
As we continue our study of statistical inference in realistic settings, statistical software 
will be invaluable. 

As we saw in Chapter 9, statistical studies comparing two or more groups are preferable 
to one-sample procedures if we want to demonstrate that a treatment causes an observed 
response. As a first step toward developing methods for comparing means from two popula- 
tions, we considered matched pairs studies (also discussed in Chapter 9) and saw that by 
looking at differences, we could use our one-sample t procedures to compare two means. In 
the next chapter, we consider inference for comparing the means of two populations when 
we have independent samples from the two populations. This will expand our tools for 
doing statistical inference in settings that we encounter in practice. 


(9) LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


* StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. You may find this helpful if you 
need additional help understanding any of the material in this chapter. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 


CHECK YOUR SKILLS 
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We prefer the t procedures to the z procedures for 
inference about a population mean because 


(a) x can be used only for large samples. 

(b) z requires that you know the population stan- 
dard deviation a. 

(c) x requires that you can regard your data as an 
SRS from the population. 


You are testing Hp: w = 100 against H,: w < 100 
based on an SRS of nine observations from a Normal 
population. The data give x = 98 and s = 3. The 
value of the t statistic is 

(a) —6. (b) —2. (c) —98. 

You are testing Hp: w = 100 against H,: w > 100 
based on an SRS of 16 observations from a Normal 
population. The t statistic ist = 2.13. The degrees of 
freedom for the t statistic are 


(a) 17. (b) 16. (@ns: 
The P-value for the statistic in Exercise 20.19 


(a) falls between 0.05 and 0.10. 
(b) falls between 0.01 and 0.05. 
(c) is less than 0.01. 


You have an SRS of six observations from a Normally 
distributed population. What critical value would 
you use to obtain an 80% confidence interval for the 
mean p of the population? 


(a) 1.440 (b) 1.476 (c) 2.015 


You are testing Hy: w = 0 against H,: w # 0 based 
on an SRS of six observations from a Normal popu- 
lation. What values of the t statistic are statistically 
significant at the a = 0.001 level? 


(a) t > 6.869 
(b) t < —6.869 ort > 6.869 
(Qt 5.893 ont 55893 


Twenty-five adult citizens of the U.S. were asked to 
estimate the average income of all U.S. households. 


Read carefully. You read in the report of a psy- 
chology experiment: “Separate analyses for our 
two groups of 12 participants revealed no overall 
placebo effect for our student group (mean = 0.08, 
SD = 0.37, t(11) = 0.49) and a significant effect 
for our non-student group (mean=0.35, 
SD = 0.37, t(11) = 3.25, p < 0.01).”!° The null 
hypothesis is that the mean effect is zero. What are 
the correct values of the two t statistics based on 
the means and standard deviations? Compare each 
correct t -value with the critical values in Table C. 


20.24 


20.25 


20.26 


20.28 


Exercises 


477 


The mean estimate was xX = $45,000 and 
s = $15,000. (Note: the actual average household 
income at the time of the study was about $68,000.) 
Assume the 25 adults in the study can be considered 
an SRS from the population of all adult citizens of 
the U.S. A 95% confidence interval for the mean es- 
timate of the average income of all U.S. households is 


(a) $38,808 to $51,192. 
(b) $39,120 to $50,880. 
(c) $39,867 to $50,133. 


Which of the following would cause the most worry 
about the validity of the confidence interval you cal- 
culated in Exercise 20.23? 


(a) You notice that there is a clear outlier in the data. 
(b) A stemplot of the data shows a mild right-skew. 
(c) You do not know the population standard 
deviation o. 


Which of these settings does not allow use of a 
matched pairs t procedure? 


(a) You interview both spouses in 400 married 
couples and ask each about the average number of 
minutes each day they spend using social media. 

(b) You interview a sample of 225 unmarried male 
students and another sample of 225 unmarried female 
students and ask each about the average number of 
minutes each day they spend using social media. 

(c) You interview 100 female students in their 
freshman year and again in their senior year and ask 
each about the average number of minutes each day 
she spends using social media. 


Because the t procedures are robust, the most 
important condition for their safe use is that 


(a) the population standard deviation o@ is known. 
(b) the population distribution is exactly Normal. 
(c) the data can be regarded as an SRS from the 
population. 


What can you say about the two-sided P-value in 
each case? 


Body mass index of young women. In Example 16.1 
(page 375), we developed a 95% zx confidence interval 
for the mean body mass index (BMI) of women aged 
20 to 29 years, based on a national random sample of 
654 such women. We assumed there that the popula- 
tion standard deviation was known to be o = 7.5. In 
fact, the sample data had mean BMI x = 26.8 and 
standard deviation s = 7.42. What is the 95% t confi- 
dence interval for the mean BMI of all young women? 
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Reading scores in Dallas. The Trial Urban District 
Assessment (TUDA) is a government-sponsored 
study of student achievement in large urban school 
districts. TUDA gives a reading test scored from 0 
to 500. A score of 243 is a “basic” reading level and 
a score of 281 is “proficient.” Scores for a random 
sample of 1400 eighth-graders in Dallas had x = 248 
with standard error 1.0.1” 


(a) We don’t have the 1400 individual scores, but 
use of the t procedures is surely safe. Why? 

(b) Give a 99% confidence interval for the mean 
score of all Dallas eighth-graders. (Be careful: 
the report gives the standard error of x, not the 
standard deviation s.) 

(c) Urban children often perform below the basic 
level. Is there good evidence that the mean for all 
Dallas eighth-graders is more than the basic level? 


Color and cognition. In a randomized comparative 
experiment on the effect of color on the performance 
of a cognitive task, researchers randomly divided 69 
subjects (27 males and 42 females ranging in age from 
17 to 25 years) into three groups. Participants were 
asked to solve a series of six anagrams. One group 
was presented with the anagrams on a blue screen, 
one group saw them on a red screen, and one group 
had a neutral screen. The time, in seconds, taken to 
solve the anagrams was recorded. The paper report- 
ing the study gives x = 11.58 and s = 4.37 for the 
times of the 23 members of the neutral group.'® 


(a) Give a 95% confidence interval for the mean time in 
the population from which the subjects were recruited. 
(b) What conditions for the population and the 
study design are required by the procedure you used 
in part (a)? Which of these conditions are impor- 
tant for the validity of the procedure in this case? 


The placebo effect. The placebo effect is particu- 
larly strong in patients with Parkinson’s disease. To 
understand the workings of the placebo effect, sci- 
entists measure activity at a key point in the brain 
when patients receive a placebo that they think is an 
active drug and also when no treatment is given.” 
The same six patients are measured both with and 
without the placebo, at different times. 


(a) Explain why the proper procedure to compare 
the mean response to placebo with control (no 
treatment) is a matched pairs t test. 

(b) The six differences (treatment minus control) had 
x = —0.326 ands = 0.181. Is there significant evi- 
dence of a difference between treatment and control? 


What do you make of Mitt? Twenty-nine college 
students, identified as having a positive attitude 
about Mitt Romney as compared to Barack Obama 
in the 2012 presidential election, were asked to rate 
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20.34 


how trustworthy the face of Mitt Romney appeared, 
as represented in their mental image of Mitt Rom- 
ney’s face. Ratings were on a scale of 0 to 7, with 0 
being “not at all trustworthy” and 7 being “extremely 
trustworthy.” Here are the 29 ratings: Ey) 


ry ai 

4.2 4.9 5.7 4.2 3.9 3.2 4.1 5.0 5.0 4.6 

A Se) SS) Sys} xe) Poy SHO) 3h,5} Sef 
(a) Suppose we can consider this an SRS of all U.S. 
college students. Make a stemplot. Is there any sign 
of major deviation from Normality? 
(b) Give a 95% confidence interval for the mean rating 
(c) Is there significant evidence at the 5% level that 
the mean rating is greater than 3.5 (a neutral rating)? 


Exhaust from school buses. Ina study of exhaust 
emissions from school buses, the pollution intake 
by passengers was determined for a sample of nine 
school buses used in the Southern California Air 
Basin. The pollution intake is the amount of ex- 
haust emissions, in grams per person, that would be 
inhaled while traveling on the bus during its usual 
18-mile trip on congested freeways from South 
Central LA to a magnet school in West LA. (As 
a reference, the average intake of motor emissions 
of carbon monoxide in the LA area is estimated to 
be about 0.000046 grams per person.) Here are the 
amounts for the nine buses when driven with the 
windows open:’! ely) EMIT 


15 033) 1040033 SSO SSO SO 0NN0-35 


(a) Make a stemplot. Are there outliers or strong 
skewness that would preclude use of the t procedures? 
(b) A good way to judge the effect of outliers is to 

do your analysis twice, once with the outliers and a 
second time without them. Give two 90% confidence 
intervals, one with all the data and one with the out- 
liers removed, for the mean pollution intake among 
all school buses used in the Southern California Air 
Basin that travel the route investigated in the study. 
(c) Compare the two intervals in part (b). What is 
the most important effect of removing the outliers? 


A big toe problem. Hallux abducto valgus (call it 
HAV) is a deforma- : 


tion of the big toe 
that often requires 
surgery. Doctors used 
X rays to measure the 
angle (in degrees) 
of deformity in 38 
consecutive patients 
under the age of 21 
who came to a medi- 
cal center for surgery 
to correct HAV. The 
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angle is a measure of the seriousness of the defor- 
mity. Here are the data:” 2, 


Ms) BYE fs) BH SX) Py 725) lke) SO) oy Aci 113) 740) 
ML iP I Bil Ak WS Sy WS) PEL DD PHO) Ns} 73 
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It is reasonable to regard these patients as a random 
sample of young patients who require HAV surgery. 
Carry out the Solve and Conclude steps for a 95% 
confidence interval for the mean HAV angle in the 
population of all such patients. 


An outlier’s effect. Our bodies have a natural elec- 
trical field that is known to help wounds heal. Does 
changing the field strength slow healing? A series of 
experiments with newts investigated this question. In 
one experiment, the two hind limbs of 12 newts were 
assigned at random to either experimental or control 
groups. This is a matched pairs design. The electrical 
field in the experimental limbs was reduced to zero 
by applying a voltage. The control limbs were left 
alone. Here are the rates at which new cells closed 
a razor cut in each limb, in micrometers per hour:” 
eq) wewrs 


Newt 
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Control limb 


Experimental 


limb 
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(a) Why is this a matched pairs design? Explain 
your answer. 

(b) Make a stemplot of the differences between 
limbs of the same newt (control limb minus 
experimental limb). There is a high outlier. 

(c) A good way to judge the effect of an outlier 
is to do your analysis twice, once with the out- 
lier and a second time without it. Carry out two 
t tests to see if the mean healing rate is signifi- 
cantly lower in the experimental limbs, with one 
test including all 12 newts and another omitting 
the outlier. What are the test statistics and their 
P-values? Does the outlier have a strong influence 
on your conclusion? 


An outlier’s effect. A good way to judge the effect 
of an outlier is to do your analysis twice, once with 
the outlier and a second time without it. The data 
in Exercise 20.34 follow a Normal distribution quite 
closely except for one patient with HAV angle 50 
degrees, a high outlier. £j))) Biatoe 

(a) Find the 95% confidence interval for the popu- 
lation mean based on the 37 patients who remain 
after you drop the outlier. 

(b) Compare your interval in part (a) with your 
interval from Exercise 20.34. What is the most 
important effect of removing the outlier? 
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Men of few words? Researchers claim that women 
speak significantly more words per day than men. 
One estimate is that a woman uses about 20,000 
words per day while a man uses about 7,000. To in- 
vestigate such claims, one study used a special device 
to record the conversations of male and female uni- 
versity students over a four-day period. From these 
recordings, the daily word counts of the 20 men in 
the study were determined. Here are their daily word 
counts:”4 TALKING 
28,408 10,084 
10,575 12,880 


8,918 6,495 8,153 7,015 4,429 
10,054 3,998 12,639 10,974 5,255 


(a) Examine the data. Is it reasonable to use the 

t procedures (assume these men are an SRS of all 
male students at this university)? 

(b) If your conclusion in part (a) is “yes,” do the 
data give convincing evidence that the mean 
number of words per day of men at this university 
differs from 7,000? 


Genetic engineering for cancer treatment. Here’s a 
new idea for treating advanced melanoma, the most 
serious kind of skin cancer: genetically engineer white 
blood cells to better recognize and destroy cancer cells, 
then infuse these cells into patients. The subjects in a 
small initial study were 11 patients whose melanoma 
had not responded to existing treatments. One ques- 
tion was how rapidly the new cells would multiply af- 
ter infusion, as measured by the doubling time in days. 
Here are the doubling times:” £y))) cncrTRT 
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21,688 
17,799 


37,786 
13,182 
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(a) Examine the data. Is it reasonable to use the t 
procedures? 

(b) Give a 90% confidence interval for the mean 
doubling time. Are you willing to use this interval 
to make an inference about the mean doubling time 
in a population of similar patients? Explain your 
reasoning. 


Genetic engineering for cancer treatment, contin- 
ued. Another outcome in the cancer experiment 
described in Exercise 20.38 is measured by a test for 
the presence of cells that trigger an immune response 
in the body and so may help fight cancer. Here are 
data for the 11 subjects: counts of active cells per 
100,000 cells before and after infusion of the modi- 
fied cells. The difference (after minus before) is the 
response variable. £jj)| MORECAN 


Before 


After 
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Difference 27 7 


O 215 20 700 13 510 34 86 108 
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(a) Explain why this is a matched pairs design. 
(b) Examine the data. Is it reasonable to use the t 
procedures? 

(c) If your conclusion in part (a) is “yes,” do the 
data give convincing evidence that the count of 
active cells is higher after treatment? 


20.40 Kicking a helium-filled football. Does a football 
filled with helium travel farther than one filled with 
ordinary air? To test this, the Columbus Dispatch con- 
ducted a study. Two identical footballs, one filled with 
helium and one filled with ordinary air, were used. A 
casual observer was unable to detect a difference in 
the two footballs. A novice kicker was used to punt 
the footballs. A trial consisted of kicking both footballs 
in a random order. The kicker did not know which 
football (the helium-filled or the air-filled football) he 
was kicking. The distance of each punt was recorded. 
Then another trial was conducted. A total of 39 trials 
were run. Here are the data for the 39 trials, in yards 
that the footballs traveled. The difference (helium mi- 
nus air) is the response variable.”* £j, 
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(a) Examine the data. Is it reasonable to use the t 
procedures? 
(b) If your conclusion in part (a) is “yes,” do the 
data give convincing evidence that the helium-filled 
football travels farther than the air-filled football? 
20.41 Growing trees faster. The concentration of car- 


bon dioxide (CO,) in the atmosphere is increasing 
rapidly due to our use of fossil fuels. Because plants 
use CO, to fuel photosynthesis, more CO, may cause 
trees and other plants to grow faster. An elaborate 
apparatus allows researchers to pipe extra CO, to a 
30-meter circle of forest. They selected two nearby 
circles in each of three parts of a pine forest and ran- 
domly chose one of each pair to receive extra CO). 
The response variable is the mean increase in base 
area for 30 to 40 trees in a circle during a growing 
season. We measure this in percent increase per 
year. Here are one year’s data:”’ £yy, 
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PAIR 


CONTROL PLOT TREATED PLOT TREATED — CONTROL 


1 


Shey) 10.587 0.835 


2. 


7.263 OMA 1.981 


3 
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5.742 8.675 DS) 


(a) State the null and alternative hypotheses. Ex- 
plain clearly why the investigators used a one-sided 
alternative. 

(b) Carry out a test and report your conclusion in 
simple language. 

(c) The investigators used the test you just carried 
out. Any use of the t procedures with samples this 
size is risky. Why? 

Fungus in the air. The air in poultry-processing 
plants often contains fungus spores. Inadequate venti- 
lation can affect the health of the workers. The prob- 
lem is most serious during the summer. To measure the 
presence of spores, air samples are pumped to an agar 
plate and “colony-forming units (CFUs)” are counted 
after an incubation period. Here are data from two 
locations in a plant that processes 37,000 turkeys per 


DAY i DAY2 DAY3_ DAY 4 


Kill room 
Processing 


3175 
wy) 


2526 
141 


1763 
362 


1090 
224 


Kill room — Processing 


2646 2385 1401 866 
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(a) Explain carefully why these are matched pairs data. 
(b) The spore count is clearly higher in the kill 
room. Give sample means and a 90% confidence 
interval to estimate how much higher. Be sure to 
state your conclusion in plain language. 

(c) You will often see the t procedures used for data 
like these. You should regard the results as only 
rough approximations. Why? 


Weeds among the corn. Velvetleaf is a particularly 
annoying weed in corn fields. It produces lots of seeds, 
and the seeds wait in the soil for years until conditions 
are right. How many seeds do velvetleaf plants pro- 
duce? Here are counts from 28 plants that came up in 
acorn field when no herbicide was used:” fy))) weeps 
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2450 2504 2114 1110 2137 8015 1623 1531 2008 1716 
721 863 1136 2819 1911 2101 1051 218 1711 164 
2228 363 5973 1050 1961 1809 130 880 


We would like to give a confidence interval for the 
mean number of seeds produced by velvetleaf plants. 
Alas, the t interval can’t be safely used for these data. 
Why not? 


20.44 Sweetening colas. Cola makers test new recipes 
for loss of sweetness during storage. Trained tasters 
rate the sweetness before and after storage. Here are 
the sweetness losses (sweetness before storage minus 


sweetness after storage) found by 10 tasters for one 


new cola recipe: Egy) coLa 


HO) O4 Oe 20) =O 2A =iles) ioe ileil 23 


Take the data from these 10 carefully trained tasters as 
an SRS from a large population of all trained tasters. 


(a) Use these data to see if there is good evidence 
that the cola lost sweetness. 

(b) It is not uncommon to see the t procedures used 
for data like these. However, you should regard the 
results as only rough approximations. Why? 


20.45 How much oil? How much oil will ultimately be 
produced by wells in a given field is key informa- 
tion in deciding whether to drill more wells. Here 
are the estimated total amounts of oil recovered 
from 64 wells in the Devonian Richmond Dolomite 
area of the Michigan basin, in thousands of barrels:*° 


Ey) on 


Alls Biol 46.4 42.7 504 97.7 103.1 Sie 
43.4 69.5 1565 346 37.9 12.9 2.5 31.4 
79.5 26.9 S14 7 32-9 196 24.9 118.2 
BYhn, — sheyll 47.6 54.2 63.1 69.8 DIA 65.6 
56.4 49.4 44.9 34.6 92.2 37.0 58.8 P63) 
36.6 64.9 148 176 29.1 614 38.6 325) 
WO sks) ON) aa NO) ad 33.7 81.1 
12 eZ On 30.5 fell OTL AUS) 3.0 2.0 


Take these wells to be an SRS of wells in this area. 


(a) Give a 95% t confidence interval for the mean 
amount of oil recovered from all wells in this area. 
(b) Make a graph of the data. The distribution is 
very skewed, with several high outliers. A computer- 
intensive method that gives accurate confidence 
intervals without assuming any specific shape for the 
distribution gives a 95% confidence interval of 40.28 
to 60.32. How does the t interval compare with this? 
Should the t procedures be used with these data? 


20.46 E. coli in swimming areas. To investigate water 


quality, the Columbus Dispatch took water samples 
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at 16 Ohio State Park swimming areas in central 
Ohio. Those samples were taken to laboratories and 
tested for E. coli, which are bacteria that can cause 
serious gastrointestinal problems. If a 100-milliliter 
sample (about 3.3 ounces) of water contains more 
than 130 E. coli bacteria, it is considered unsafe. 
Here are the E. coli levels found by the laborato- 
ties:?! & 


291.0 190.4 47.0 86.0 44.0 18.9 1.0 50.0 
10.9 py) debs) SHO SIH) SAID) hs18) 


Take these water samples to be an SRS of the water 
in all swimming areas in central Ohio. 

(a) Are these data good evidence that on aver- 

age the E. coli levels in these swimming areas were 
unsafe? 

(b) Make a graph of the data. The distribution is 
very skewed. Another method that gives P-values 
without assuming any specific shape for the distri- 
bution gives a P-value of 0.9997 for the question in 
part (a). How does the one-sample t test compare 
with this? Should the t procedures be used with 
these data? 


The following exercises ask you to answer questions from data 
without having the details outlined for you. The four-step pro- 
cess is illustrated in Examples 20.2, 20.3, and 20.4. The exercise 
statements give you the State step. Follow the Plan, Solve, and 
Conclude steps in your work. 


20.47 Natural weed control? Fortunately, we aren't really 
42 interested in the number of seeds velvetleaf plants 

produce (see Exercise 20.43). The velvetleaf seed 
beetle feeds on the seeds and might be a natural 
weed control. Here are the total seeds, seeds infected 
by the beetle, and percent of seeds infected for 
28 velvetleaf plants: {jj}) woctrRL 


Seeds 2450 2504 2114 1110 2137 8015 1623 1531 2008 1716 
Infected 135 101 76 24 121 189 31 44 73 12 
Percent | 5.5 4.0 3.6 22 5.7 24 19 2.9 3.6 0.7 
Seeds 721 863 1136 2819 1911 2101 1051 218 1711 164 
Infected 27 40 41 79 82 85 42 0 64 i 
Percent | 3.7 46 36 28 43 40 40 0.0 3.7 43 
Seeds 2228 363 5973 1050 1961 1809 130 880 
Infected} 156 31 240 91 137 92 5 23 
sree | 70) 5) 40) eh AO SIL Sher AA) 


Do a complete analysis of the percent of seeds infect- 
ed by the beetle. Include a 90% confidence interval 
for the mean percent infected in the population of 
all velvetleaf plants. Do you think that the beetle is 
very helpful in controlling the weed? Why is analyz- 
ing percent of seeds infected more useful than ana- 
lyzing number of seeds infected? 
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Recruiting T cells. There is evidence that cytotoxic 
T lymphocytes (T cells) participate in controlling 
tumor growth and that they can be harnessed to use 
the body’s immune system to treat cancer. One study 
investigated the use of a T cell-engaging antibody, 
blinatumomab, to recruit T cells to control tumor 
growth. The data below are T cell counts (1000 per 
microliter) at baseline (beginning of the study) and 
after 20 days on blinatumomab for six subjects in the 
study.” The difference (after 20 days minus base- 
line) is the response variable. TCELLS 


Baseline 0.04 0.02 0.00 0.02 0.38 0.33 
After 20 days 0.28 = 0.47 130) OLS) Ia O44: 
Difference 0.24 0.45 1310 73) ey OL ITT 
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Do the data give convincing evidence that the mean 
count of T cells is higher after 20 days on blinatu- 
momab? 


Recruiting T cells, continued. Give a95% confidence 
interval for the mean difference in T cell counts (after 
20 days minus baseline) in Exercise 20.48 £jy,| TcELLs 


Mutual funds performance. Mutual funds often 
compare their performance with a benchmark 
provided by an “index” that describes the per- 
formance of the class of assets in which the fund 
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(a) Explain clearly why the matched pairs t test is 
the proper choice to answer this question. 

(b) Do a complete analysis that answers the ques- 
tion posed. 


Right versus left. The design of controls and instru- 
ments affects how easily people can use them. Timo- 
thy Sturm investigated this effect in a course project, 
asking 25 right-handed students to turn a knob (with 
their right hands) that moved an indicator by screw 
action. There were two identical instruments: one 
with a right-hand thread (the knob turns clockwise), 
and the other with a left-hand thread (the knob 
turns counterclockwise). Table 20.4 gives the times 
in seconds each subject took to move the indicator a 
fixed distance.*? jy) RTLFT 


(a) Each of the 25 students used both instruments. 
Explain briefly how you would use randomization in 
arranging the experiment. 

(b) The project hoped to show that right-handed 
people find right-hand threads easier to use. Do an 
analysis that leads to a conclusion about this issue. 


invests. For example, the Vanguard International RIGHT LEFT RIGHT LEFT 
Growth Fund benchmarks its performance against SUBJECT THREAD THREAD | SUBJECT THREAD THREAD 
the Spliced International Index. Table 20.3 gives 1 13) 137 14 107 87 
annual returns (percent) for the fund and the in- 2 105 105 15 118 166 
dex. Does the fund’s performance differ signifi- 3 130 133 16 103 146 
cantly from that of its benchmark? £)))) Meunp 4 101 108 17 111 1a 
5 138 115 18 104 135 
6 118 170 19 111 112 
q 87 103 20 89 93 
FUND INDEX FUND _—INDEX 8 116 145 21 18 76 
RETURN RETURN RETURN RETURN 9 75 78 DD: 100 116 
YEAR (%) (%) | YEAR = (%) (%) 10 96 107 23 89 78 
1984 1402 7.38 1999 26.34 26.96 iil 122 84 24 85 101 
1985 56.94 56.16 | 2000 —8.60 —14.17 12 103 148 25 88 123 
1986 56.71 69.44 | 2001 —18.92 —21.44 13 116 147 
1987 12.48 24.63 | 2002 -—17.79 —15.94 
1988 11.61 DX || Deis 34.45 38.59 20.52 Comparing two drugs. Makers of generic drugs 
1989 24.76 10.54 | 2004 18.95 20.25 must show that they do not differ significantly from 
1990 = 105 2345 2005 15.00 13.54 | the “reference” drugs that they imitate. One aspect 
1991 4.74 12.13 | 2006 25.92 26.34 in which drugs might differ is their extent of absorp- 
1992 579  —12.17 | 2007 15.98 11.17 tion in the blood. Table 20.5 gives data taken from 
1993 44.74 32.56 | 2008 44.94. 43.38 20 healthy nonsmoking male subjects for one pair of 
1994 0.76 7.78 | 2009 41.63 31.78 drugs.** This is a matched pairs design. Numbers 1 to 
* - : : 20 were assigned at random to the subjects. Subjects 
1995 14,89 pod | LU) ibe ol 1 to 10 received the generic drug first, followed by 
1996 14.65 6.05 | 2011 —13.68  — 13.71 the reference drug. Subjects 11 to 20 received the 
1997 4.12 178 | 2012 20.01 16.83 reference drug first, followed by the generic drug. In 
1998 16.93 20.00 | 2013 22.95 15.29 all cases, a washout period separated the two drugs 


so that the first had disappeared from the blood be- 
fore the subject took the second. By randomizing 
the order, we eliminate the order in which the drugs 
were administered from being confounded with the 
difference in the absorption in the blood. Do the 
drugs differ significantly in the amount absorbed in 


the blood? 


REFERENCE GENERIC 
SUBJECT DRUG DRUG 
15 4108 M55) 
3 2526 1138 
9 2779 1613 
13 3852 2254 
1) 1833 1310 
8 2463 2120 
18 2059 1851 
20 1709 1878 
17 1829 1682 
2 2594 2613 
4 2344 2738 
16 1864 2302 
6 1022 1284 
10 2256 3052 
5 938 1287 
q 1339 1930 
14 1262 1964 
11 1438 2549 
il 1735 3340 
19 1020 3050 
20.53 Practical significance? Give a 90% confidence in- 


terval for the mean time advantage of right-hand over 
left-hand threads in the setting of Exercise 20.51. Do 
you think that the time saved would be of practical 
importance if the task were performed many times— 
for example, by an assembly-line worker? To help an- 
swer this question, find the mean time for right-hand 
threads as a percent of the mean time for left-hand 
threads. £y 
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20.54 Bad weather, bad tips? As part of the study of tip- 


18.0 
17.0 


20.55 


ping in a restaurant that we met in Example 16.3 
(page 381) the psychologists also studied the size of 
the tip in a restaurant when a message indicating 
that the next day’s weather would be bad was written 
on the bill. Here are tips from 20 patrons, measured 
in percent of the total bill: £y)) 


19.1 19.2 188 18.4 
S40 17s} BOLO) 02 


19.0 
18.8 


18.5 
18.0 23.2 


16.1 16.8 14.0 


18.2 19.4 


Do the data give convincing evidence that the mean 
percentage tip for all patrons of this restaurant when 
their bill contains a message that the next day’s 
weather will be bad is less than 20%? (Note that 20% 


is an often-recommended size for restaurant tips.) 


t versus z. If you examine Table C, you will notice 
that critical values of the t distribution get closer and 
closer to the corresponding critical values of the Nor- 
mal distribution as the number of degrees of freedom 
increases. You can see this by comparing the  criti- 
cal values at the bottom of Table C with the t critical 
values in the corresponding column. This suggests 
that, for very large sample sizes, inference based on 
the Normal probability calculations in Chapters 15 
and 16 (pretending @ is known) and inference based 
on the t distribution as discussed in this chapter (7 
is not known) may give essentially the same answer 
if sample sizes are large and we pretend that our es- 
timate of o is the true value of a. Many statistical 
software packages will calculate t probabilities. If you 
have access to such software, answer the following 
questions. 


(a) Use software to determine how large a sample 
size (or how many degrees of freedom) is needed 
for the critical value of the t distribution to be 
within 0.01 of the corresponding critical value of 
the Normal distribution for a 90%, 95%, and 99% 
confidence interval for a population mean. 

(b) Based on your findings, how large a sample 
size do you think is needed for inference using the 
Normal distribution and inference using the t distri- 
bution to give very similar results if o (both the true 
value and its estimate) is 1? If a (both the true value 
and its estimate) is 100? 


Matched pairs. Find an example of a matched pairs study on the web. The 
Journal of the American Medical Association (jama.amaassn.org), Science 
Magazine (www.sciencemag.org), the Canadian Medical Association Jour- 
nal (www.cmaj.ca), the Journal of Statistics Education (www. amstat.org/ 
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publications/jse), or perhaps the Journal of Quantitative Analysis in Sports 

(www. bepress.com/jqas) are possible sources. To help locate an article, 
look through the abstracts of articles. Once you find a suitable article, read it, 
and then briefly describe the study (including why it is a matched pairs study) 
and its conclusions. If P-values, means, standard deviations, and so on are re- 
ported, be sure to include them in your summary. Also, be sure to give the refer- 
ence (either the web link or the journal, issue, year, title of the paper, authors, 
and page numbers). 


Improper use of at procedure. Search the web for an example of an improper 
use of a t procedure. You might try using Google to do a search on “improper use 
of at test.” Summarize the study in which the t procedure was used and discuss 
how it was used improperly. Be sure to provide a link to your example or an 
appropriate reference. 
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omparing two populations or two treatments is one of the most common 
situations encountered in statistical practice. We call such situations two- 
sample problems. 


Two-Sample Problems 


~The goal of inference is to compare the responses to two treatments or to compare 
the characteristics of two populations. 


» We have a separate sample from each treatment or each population. 


A two-sample problem can arise from a randomized comparative experiment that 
randomly divides subjects into two groups and exposes each group to a different 
treatment. Comparing random samples separately selected from two populations is 
also a two-sample problem. Unlike the matched pairs designs studied earlier, there 
is no matching of the individuals in the two samples. The two samples are assumed 
to be independent and can be of different sizes. Inference procedures for two-sample 
data differ from those for matched pairs. Here are some typical two-sample problems: 


21.1 
21.2 


21.3 
21.4 
21.5 
21.6 
21.7 


21.8 


21.9 


Two-sample problems 


Comparing two population 
means 


Two-sample f procedures 
Using technology 

Robustness again 

Details of the t approximation* 


Avoid the pooled two-sample 
t procedures* 


Avoid inference about 
standard deviations* 


Permutation tests* 
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Does regular physical therapy help lower-back pain? A randomized experiment 
assigned patients with lower-back pain to two groups: 142 received an examina- 
tion and advice from a physical therapist; another 144 received regular physical 
therapy for up to five weeks. After a year, the change in their level of disability 
(0% to 100%) was assessed by a doctor who did not know which treatment the 
patients had received. 


A psychologist develops a test that measures social insight. He compares the so- 
cial insight of female college students with that of male college students by giv- 
ing the test to a sample of female students and a separate, independent sample 
of male students. 


A bank wants to know which of two incentive plans will most increase the use of its 
credit cards. It offers each incentive to independent random samples of credit card 
customers and compares the amounts charged during the following six months. m 


() LaunchPad Online Resources 


* The StatBoards video, Examples of Two-Sample Problems, discusses examples 


of two-sample problems. 


Apply Your Knowledge 


TQ privine WHILE 
Wik FASTING 


Muslims fast from sunrise to sunset 
during the month of Ramadan. 
Does this affect the rate of traffic 
accidents? Fasting can improve 
alertness, reducing accidents. Or it 
can cause dehydration, increasing 
accidents. Data from Turkey show 

a statistically significant increase, 
starting two weeks into Ramadan. 
Ah, but because Ramadan follows 

a lunar calendar, it cycles through 
the year. Perhaps accidents go down 
during a winter Ramadan (alertness) 
but go up during a summer Ramadan 
(longer fast and dehydration). Ask 
the statisticians this question and 
get their favorite answer: we need 
more data. 


Which data design? Each situation described in Exercises 21.1 to 21.4 requires infer- 


ence about a mean or means. Identify each as involving (1) a single sample, (2) matched 
pairs, or (3) two independent samples. The procedures of Chapter 20 apply to designs 


(1) and (2). We are about to learn procedures for design (3). 


21.1 How Sweet ItIs. Choose 10 agricultural plots of land located in different 
regions of the state. Plant half of each plot with one variety of sweet corn 
and the other with a different variety. Compare the sugar contents of both 
varieties of corn. 


21.2 Exercise and Memory. To examine the effect of physical exertion on the 
memory of a witnessed event, researchers randomly divided police officers 
into two groups. One group completed a high-intensity physical assault 
exercise and the other did no exercise. Both groups were then placed in a 
live, occupationally relevant scenario. After the scenario, the mean scores 
on an exam testing recall of what was witnessed during the scenario were 
compared. 


21.3 Smart Phone Battery Life. To check the lifetime of a new battery in a 
soon-to-be-released smartphone, an engineer surfed the web over 4G contin- 
uously, beginning with a fully charged battery, until the battery was drained. 
The engineer repeated this for a total of 10 times. He compared the results 
with the lifetime of 12.8 hours for the current battery used in the company’s 
smartphone. 


21.4 Smart Phone Battery Life, Continued. Another engineer is checking 
the lifetime of the new battery with a prototype that she just developed. 
She wants to know how the lifetimes of the new battery and her prototype 
compare. She measures the lifetime of the new battery and her prototype by 
continuously surfing the web over 4G until the battery is drained. She repeats 
this a total of 10 times for each type of battery. 
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Comparing two populations or the responses to two treatments starts with data 
analysis: make boxplots, stemplots (for small samples), or histograms (for larger 
samples) and compare the shapes, centers, and variabilities of the two samples. The 
most common goal of inference is to compare the average or typical responses in the 
two populations. When data analysis suggests that both population distributions are 
symmetric, and especially when they are at least approximately Normal, we want to 
compare the population means. Here are the conditions for inference about means. 


Conditions for Inference Comparing Two Means 


We have two SRSs, from two distinct populations. The samples are independent. 
That is, one sample has no influence on the other. Matching violates independence, 
for example. We measure the same response variable for both samples. 

Both populations are Normally distributed. The means and standard deviations 
of the populations are unknown. In practice, it is enough that the distributions have 
similar shapes and that the data have no strong outliers. 


Call the variable we measure x, in the first population and x, in the second, 
because the variable may have different distributions in the two populations. Here 
is how we describe the two populations: 


Population Variable Mean Standard Deviation 
1 X1 Mi O1 
z X2 Br O21 


There are four unknown parameters: the two means and the two standard devia- 
tions. The subscripts remind us which population a parameter describes. We want to 
compare the two population means, either by giving a confidence interval for their 
difference 4; — [2 or by testing the hypothesis of no difference, Hy: gm; = M2. The 
hypothesis of no difference is used when investigating whether a treatment has an effect. 

We use the sample means and standard deviations to estimate the unknown 
parameters. Again, subscripts remind us which sample a statistic comes from. Here 
is how we describe the samples: 


Sample Sample Sample 
Population Size Mean Standard Deviation 
1 nt Xt S] 
2 ny X) S2 


To make inferences about the difference 4; — jf, between the means of the two popu- 
lations, we start from the difference x, — x, between the means of the two samples. 


expend. James Levine and his collaborators at the Mayo Clinic investigated the link 


(STATE People gain weight when they take in more energy from food than they 42 
between obesity and energy spent on daily activity.’ 
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ACTIVE 
Lean Obese 
2|67 
72'|3)|567 
4/11236 
886410 | 5 
81/6 
FIGURE 21.1 


Back-to-back stemplot of the times 
spent walking or standing, for 
Example 21.2. 


Choose 20 healthy volunteers who don't exercise. Deliberately choose 10 who are 
lean and 10 who are mildly obese but still healthy. Attach sensors that monitor the 
subjects’ every move for 10 days. Table 21.1 presents data on the time (in minutes 
per day) that the subjects spent standing or walking, sitting, and lying down. Do lean 
and obese people differ in the average time they spend standing and walking? 


GROUP SUBJECT STAND/WALK SIT LIE 

Lean 1 511.100 370.300 555.500 
Lean 2 607.925 374.512 450.650 
Lean 3 319.212 582.138 537.362 
Lean 4 584.644 357.144 489.269 
Lean 5 578.869 348.994 514.081 
Lean 6 543.388 385.312 506.500 
Lean 7 677.188 268.188 467.700 
Lean 8 555.656 322.219 567.006 
Lean 9 374.831 537.031 531.431 
Lean 10 504.700 528.838 396.962 
Obese 11 260.244 646.281 521.044 
Obese 12 464.756 456.644 514.931 
Obese 13 367.138 578.662 563.300 
Obese 14 413.667 463.333 532.208 
Obese 15 347.375 567.556 504.931 
Obese 16 416.531 567.556 448.856 
Obese 17 358.650 621.262 460.550 
Obese 18 267.344 646.181 509.981 
Obese 19 410.631 572.769 448.706 
Obese 20 426.356 591.369 412.919 


)PLAN: Examine the data and carry out a test of hypotheses. We suspect in advance 
that lean subjects (Group 1) are more active than obese subjects (Group 2), so we 
test the hypotheses 


Ho: Mi = Pe 
Hq: py > peo 


BOGS oLe Are the conditions for inference met? The subjects are volun- 


teers, so they are not SRSs from all lean and mildly obese adults. The study tried to 
recruit comparable groups: all worked in sedentary jobs, none smoked or were tak- 
ing medication, and so on. Setting clear standards like these helps make up for the 
fact that we can’t reasonably get SRSs for so invasive a study. The subjects were not 
told that they were chosen from a larger group of volunteers because they did not 
exercise and were either lean or mildly obese. Because their willingness to volunteer 
isn’t related to the purpose of the experiment, we will treat them as two independent 
SRSs. 

A back-to-back stemplot (Figure 21.1) displays the data in detail. To make the 
plot, we rounded the data to the nearest 10 minutes and used 100s as stems and 


21.3  Two-Sample t Procedures 


10s as leaves. The distributions are a bit irregular, as we expect with just 10 obser- 
vations. There are no clear departures from Normality such as extreme outliers or 
skewness. The lean subjects as a group spend much more time standing and walking 
than do the obese subjects. Calculating the group means confirms this: 


Group n Meanx Std. Dev. s 
Group 1 (lean) 10 525.751 107.121 
Group 2 (obese) 10 373.269 67.498 


The observed difference in mean time per day spent standing or walking is 


X1 — X. = 525.751 — 373.269 = 152.482 minutes 


To complete the Solve step, we must learn the details of inference for comparing two 
means. m 
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To assess the significance of the observed difference between the means of our 
two samples, we follow a familiar path. Whether an observed difference is surprising 
depends on the variability of the observations as well as on the two means. Widely 
different means can arise just by chance if the individual observations vary a great 
deal. To take variation into account, we would like to standardize the observed 
difference x; — x, by dividing by its standard deviation. This standard deviation of 
the difference in sample means is 


ny n2 


This standard deviation gets larger as either population gets more variable, that is, 
as 1 OF G> increases. It gets smaller as the sample sizes n; and n; increase. 

Because we don’t know the population standard deviations, we estimate them by 
the sample standard deviations from our two samples. The result is the standard 
error, or estimated standard deviation, of the difference in sample means: 


Deg =a a 


When we standardize the estimate by dividing it by its standard error, the result is 
the two-sample t statistic: 


X1 —~ X2 
t SO 
2 2 
S{ SZ 
aoe + a 
ny nN 


The statistic t has the same interpretation as any x or ¢ statistic: it says how far 
X, — X is from zero in standard error units. 

The two-sample t statistic has approximately a t distribution. It does not have 
exactly a t distribution even if the populations are both exactly Normal. In practice, 
however, the approximation is very accurate. There are two practical options for 
using the two-sample t procedures: 


= Option 1. With software, use the statistic t with accurate critical values 
from the approximating t distribution. The degrees of freedom are calculated 
from the data by a somewhat messy formula. Moreover, the degrees of freedom 
may not be a whole number. 


standard error 


two-sample t statistic 
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» Option 2. Without software, use the statistic t with critical values from 
the t distribution with degrees of freedom equal to the smaller of n, — 1 and n, — 1. 
These procedures are always conservative for any two Normal populations. The 
confidence interval has a margin of error as large as or larger than is needed for 
the desired confidence level. The significance test gives a P-value equal to or 
greater than the true P-value. 


The two options are exactly the same except for the degrees of freedom used 
for t critical values and P-values. As the sample sizes increase, confidence levels 
and P-values from Option 2 become more accurate. The gap between what Option 2 
reports and the truth is quite small unless the sample sizes are both small and 
unequal.’ 


The Two-Sample t Procedures 


Draw an SRS of size n,; from a large Normal population with unknown mean p;, 
and draw an independent SRS of size n; from another large Normal population with 
unknown mean j;. A level C confidence interval for 4, — pe is given by 

2 2 
= = Si $2 
(x1 a X) ag {a fy fy 
Here t* is the critical value for confidence level C for the t distribution with degrees 
of freedom from either Option 1 (software) or Option 2 (the smaller of n; — 1 and 


hy = 1). 
To test the hypothesis Ho: , = Mz, calculate the two-sample t statistic 
xy —- X) 
(= = 
& S 
oe 
ny n2 


Find P-values from the t distribution with degrees of freedom from either Option 1 
(software) or Option 2 (the smaller of n; — 1 and n; — 1). 


We can now complete Example 21.2. 
BORE Mciculod The two-sample t statistic comparing the average minutes 


spent standing and walking in Group 1 (lean) and Group 2 (obese) is 


X; — Xz 
t= —L 
2 2 
st 3 
pee + —— 
ny No 


525.751 — 373.269 
= 67.4982 


+ 
10 10 
_ 152.482 _ 3 ag 
40.039 


Software (Option 1) gives one-sided P-value P = 0.0008 based on df = 15.174. 
Without software, use the conservative Option 2. Because n; — 1 =9 and 
nz — 1 = 9, there are nine degrees of freedom. Because H, is one-sided, the P-value 
is the area to the right of t = 3.808 under the t(9) curve. Figure 21.2 illustrates this 
P-value. Table C shows that t = 3.808 lies between the critical values t* for 0.0025 
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t distribution with 
9 degrees of freedom. 


P-value for t = 3.808 
is the area to the right 
of t = 3.808. 


0 t= 3.808 


and 0.001. So 0.001 < P < 0.0025. Option 2 gives a larger (more conservative) 
P-value than Option 1. As usual, the practical conclusion is the same for both versions 
of the test. 


Ree eM ew There is very strong evidence (P = 0.0008) that, on average, lean 
people spend more time walking and standing than do moderately obese people. m 


FIGURE 21.2 

Using the conservative Option 2, the 
P-value in Example 21.3 comes from 
the t distribution with nine degrees of 
freedom. 


i= 3.690 4.297 
One-sided P_ 0.0025 0.001 


Does lack of daily activity cause obesity? This is an observational study, and that 
affects our ability to draw cause-and-effect conclusions. It may be that some people 
are naturally more active and are therefore less likely to gain weight. Or it may be 
that people who gain weight reduce their activity level. The study went on to enroll 
most of the obese subjects in a weight-reduction program and most of the lean sub- 
jects in a supervised program of overeating. After eight weeks, the obese subjects had 
lost weight (mean eight kilograms) and the lean subjects had gained weight (mean 
four kilograms). But both groups kept their original allocation of time to the differ 
ent postures. This suggests that time allocation is biological and influences weight, 
rather than the other way around. The authors remark: “It should be emphasized 
that this was a pilot study and that the results need to be confirmed in larger studies.” 


‘PLAN: To estimate how much more active lean people are, we give a 90% confi- 
dence interval for ww; — M2, the difference in average daily minutes spent standing 
and walking between lean and mildly obese adults. 


As in Example 21.3, the conservative Option 2 uses nine 
degrees of freedom. Table C shows that the t(9) critical value is t* = 1.833. We are 
90% confident that ww, — pz lies in the interval 


e-Dee 42 
ny N2 
107.1212 67.4987 
= (525.751 = 373.269) ae 1.293\/ 10 + 10 


152.482 + 73.390 
79.09 to 225.87 minutes 
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Software using Option 1 gives the 90% interval as 82.35 to 222.62 minutes, based 
on t with 15.174 degrees of freedom. The Option 2 interval is wider because this 
method is conservative. Both intervals are quite wide because the samples are small 
and the variation among individuals, as measured by the two sample standard devia- 
tions, is large. Whichever interval we report, we are (at least) 90% confident that the 
mean difference in average daily minutes spent standing and walking between lean 
and mildly obese adults lies in this interval. m 


WSTATE® Do college students who have volunteered for community 
service work and those who have not differ in how attached they 
are to their friends? A study obtained data from 57 students who 
had done service work and 17 who had not. One of the response 
variables was a measure of attachment to friends as measured by 
the Inventory of Parent and Peer Attachment (larger scores indicate 
greater attachment). In particular, the response is a score based on 
the responses to 25 questions. Here are the results:? 


Hero Images/Getty Images 


Group Condition n x s 
1 Service 57 105.32 14.68 
2 No service 7 96.82 14.26 
sti — 
int META-ANALYSIS “PLAN: The investigator had no specific direction for the difference in mind before 
mr Small samples have large looking at the data, so the alternative is two-sided. We will test the hypotheses 
margins of error. Large samples 
are expensive. Often we can find Ho: by = be 
several studies of the same issue; if Ha: pi F pe 
we could combine their results, we 
would have a large sample with a | SOLVE: fila investigator says that the individual scores, examined separately in 
small margin of error. That is the the two samples, appear roughly Normal. There is a serious problem with the more 
idea of “meta-analysis.” Of course, important condition that the two samples can be regarded as SRSs from two student 
we can’t just lump the studies populations. We will discuss that after we illustrate the calculations. 
together, because of differences in The two-sample t statistic is 
design and quality. Statisticians 
have more sophisticated ways -_ X1 — Xz 
of combining the results. Meta- sf 2 
analysis has been applied to =i 4 72 
issues ranging from the effect of mt, le 
secondhand smoke to whether 105.32 — 96.82 
coaching improves SAT scores. ees . 14.262 
vA 17 
8.5 
= 2 = 2.142 
3.9677 
Software (Option 1) says that the two-sided P-value is P = 0.0414. 
Without software, use Option 2 to find a conservative P-value. There are 16 degrees 
of freedom, the smaller of 
n—-1=57-1=56 and n—-1=17-1=16 
df = 16 ; Dey 
te 2120 2.235 Figure 21.3 illustrates the P-value. Find it by comparing t = 2.142 with the two- 


sided critical values for the t(16) distribution. Table C shows that the P-value is 
Two-sided P__ 0.05 0.04 between 0.05 and 0.04. 
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t distribution with 16 
degrees of freedom. 


P-value is 2 times 
this area. 


t=-2.142 (0) t=2.142 


ESOMSRUA The data give moderately strong evidence (P < 0.05) that students 
who have engaged in community service do, on the average, differ from those who 
have not engaged in community service in how attached they are to their friends (and 
the data suggest they are more attached to their friends). m 


Is the t test in Example 21.5 justified? The student subjects were “enrolled in a 
course on U.S. Diversity at a large mid-western university.” Unless this course is 
required of all students, the subjects cannot be considered a random sample even from 
this campus. Students were placed in the two groups on the basis of a questionnaire, 39 
in the “no service” group and 71 in the “service” group. The data were gathered from 
a follow-up survey two years later; 17 of the 39 “no service” students responded (44%), 
compared with 80% response (57 of 71) in the “service” group. Nonresponse is con- 
founded with group: students who had done community service were much more likely 
to respond. Finally, 75% of the “service” respondents were women, compared with 47% 
of the “no service” respondents. A person’s sex, which can strongly affect attachment, 
is badly confounded with the presence or absence of community service. The data 
are so far from meeting the SRS condition for inference that the t test is meaning- 
less. Difficulties like these are common in social science research, where confounding 
variables have stronger effects than is usual when biological or physical variables are 
measured. This researcher honestly disclosed the weaknesses in data production but 
left it to readers to decide whether to trust her inferences. 


() LaunchPad Online Resources 


* The StatBoards video, Two-Sample t Confidence Intervals, discusses examples 
of constructing confidence intervals using both options. 


* The StatBoards video, Two-Sample t Tests, discusses examples of testing 
using both options. 


* The Snapshots video, Comparing Two Means, provides further discussion of 
procedures for comparing two means. 


* The StatClips video, Confidence Intervals: Intervals for Differences Example B, 
presents an example of a confidence interval for a difference in two means. 


FIGURE 21.3 

The P-value in Example 21.5. Because 
the alternative is two-sided, the 
P-value is double the area to the right 
of t= 2.142 
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Apply Your Knowledge 


In exercises that call for two-sample t procedures, use Option 1 if you have technology 
that implements that method. Otherwise, use Option 2 (degrees of freedom the smaller of 
itn = I eine a = 1), 


21.5 Nintendo and Laparoscopic Skills. In laparoscopic surgery, a video 
42 camera and several thin instruments are inserted into the patient’s abdomi- 
nal cavity. The surgeon uses the image from the video camera positioned 
inside the patient’s body to perform the procedure by manipulating the 
instruments that have been inserted. It has been found that the Nintendo 
Wii™, with its motion-sensing interface, replicates the movements 
required in laparoscopic surgery more closely than other video games. If 
training with a Nintendo Wii™ can improve laparoscopic skills, it can com- 
plement the more expensive training on a laparoscopic simulator. Forty- 
two medical residents were chosen and all were tested on a set of basic 
laparoscopic skills. Twenty-one were selected at random to undergo system- 
atic Nintendo Wii™ training for one hour a day, five days a week, for four 
weeks. The remaining 21 residents were given no Nintendo Wii™ training 
and asked to refrain from video games during this period. At the end of four 
weeks, all 42 residents were tested again on the same set of laparoscopic 
skills. One of the skills involved a virtual gall bladder removal, with several 
performance measures including time to complete the task recorded. Here 
are the improvement (before minus after) times in seconds after four weeks 

for the two groups:* NINT 


Treatment Control 
291 134 186 128 84 243 21 66 54 85 229 92 
212) 121 134 221 59 244 43 27 77 -29 -14 88 
79 +333 -13 -16 71-16 = 145 110 32 90 45 —81 
71 =77—=«144 68 61 44 


Does the Nintendo Wii™ training significantly increase the mean improve- 
ment time? Follow the four-step process as illustrated in Examples 21.2 
and 21.3. 
21.6 Daily Activity and Obesity. We can conclude from Examples 21.2 and 
45 21.3 that mildly obese people spend less time standing and walking (on the 
average) than lean people. Is there a significant difference between the mean 
times the two groups spend lying down? Use the four-step process to answer 
this question from the data in Table 21.1. Follow the model of Examples 21.2 
and 21.3. ACTIVE 
21.7 Nintendo and Laparoscopic Skills, Continued. Use the data in 
Exercise 21.5 to give a 99% confidence interval for the difference in mean 


improvement times between the treatment and control groups. jy.) NINT 


Software should use Option | for the degrees of freedom to give accurate confi- 
dence intervals and P-values. Unfortunately, there is variation in how well software 
implements Option 1. Figure 21.4 displays output from a graphing calculator, three 
statistical programs, and a spreadsheet program for the test of Example 21.3. All 
four claim to use Option 1. The two-sample t statistic is exactly as in Example 21.3, 


Texas Instruments Graphing Calculator 


2—-SameTTest 


a ed 
L=3, 868375604 
P=, 


4161984 e -4 
df=15.17355983¢ 
H1=5275. 7515 

4 2697 


Minitab 
Session 


Two-sample T for stand 


StDev_ SI 
107 
67.5 


Mean 
526 
37323 


group N 
1 10 
2 10 


Difference = mu (1) 
Estimate for difference: 
T-Test of difference = 0 


— mu (2) 
152:.5 


(vs >): T-Value = 


Microsoft Excel 


3.81 P-Value = 


t-Test: Two-Sample Assuming Unequal Variances 


Lean 
D204 D143 
11474 .8903 


Mean 

Variance 
Observations 
Hypothesized Mean 


3 ..808375604 
0.000856818 
1.753051038 
0.001713635 
2.131450856 


t Stat 

P(T<=t) one-tail 

t Critical one-tail 
T<=t) two-tail 
[<t) two-tail 


Q 
BR BR 
= uw ° 
i 
S (2) 


| 


Obese 
373.2692 
4556.019849 


JMP 
Assuming unequal variances 
152.482 ) 3.808376 
40.039 1517233 
20 ST 0.0017* 
S07 22 0.0008 , 
0.95 @,cg0m 2% 
FIGURE 21.4 
hree statistical programs, and a spreadsheet program. 


"100-50 


0 


50 100 150 200 


21.4 Using Technology 


0.001 DF 


Crunchlt! 


Results - t 2-Sample aix 
Export 
Null hypothesis: 


Difference of means = 0 


Alternative hypothesis: Difference of means > 0 


crip] De Sern Moar 
fr frors [ease [ro 
a _jerso fares [ro 
df: 


ISA 
Difference of means: 152.5 


3.808 
0.0008410 


t statistic: 


P-value: 


The two-sample t procedures applied to the data on activity and obesity: output from a graphing calculator, 
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t = 3.808. You can find this in all four outputs (Minitab rounds to 3.81; Excel and 
the graphing calculator give additional decimal places). The different technologies 
use different methods to find the P-value for t = 3.808. 


i 


i 


Crunchlt!, JMP and the calculator get Option 1 completely right. The accurate 
approximation uses the t distribution with approximately 15.174 (Crunchlt! 
rounds this to 15.17) degrees of freedom. The P-value is P = 0.0008. 


Minitab uses Option 1, but it truncates the exact degrees of freedom to 

the next smaller whole number to get critical values and P-values. In 

this example, the exact df = 15.174 is truncated to df = 15, so that 
Minitab’s results are slightly conservative. That is, Minitab’s P-value 
(rounded to P = 0.001 in the output) is slightly larger than the full 
Option | P-value. 

Excel rounds the exact degrees of freedom to the nearest whole number, so 
that df = 15.174 becomes df = 15. Excel’s method agrees with Minitab’s in 
this example. But when rounding moves the degrees of freedom up to the 
next higher whole number, Excel’s P-values are slightly smaller than is correct. 
This is misleading, another illustration of the fact that Excel is substandard as 
statistical software. 


JMP’s and Excel’s label that the test assumes unequal variances is a bit mis- 


not the two populations have the same variance. There is an old-fashioned 


A leading. The two-sample t procedures we have described work whether or 


special procedure that works only when the two variances are equal. We 


discuss this method in Section 21.7, but there is no need to use it in two-sample 
problems. 


Although different calculators and software give slightly different P-values, in 


practice you can just accept what your technology says. The small differences in 
P don’t affect the conclusion. Even “between 0.001 and 0.0025” from Option 2 
(Example 21.3) is close enough for practical purposes. 


Apply Your Knowledge 


21.8 Perception of Life Expectancy. Do women and men differ in how they 
perceive their life expectancy? A researcher asked a sample of men and women 
to indicate their life expectancy. This was compared with values from ac- 
tuarial tables, and the relative percent difference was computed (perceived 
life expectancy minus life expectancy from actuarial tables was divided by 
life expectancy from actuarial tables and converted to a percent). Here are 


the relative percent differences for all men and women over the age of 70 in 
the sample:° LIFEEXP 


Men —28 -23 -20 -19 -14 —-13 
Womens |-—200— 19 S51 O85 


Figure 21.5 shows output for the two-sample t test using Option 1. (This 
output is from Crunchlt! software, which does Option 1 without round- 
ing or truncating the degrees of freedom.) Do men and women over 
70 years old differ in their perceptions of life expectancy? Using the output in 
Figure 21.5, write a summary in a sentence or two, including t, df, P, and 
a conclusion. 
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(inesults-€2-5 ae rar FIGURE 21.5 
Two-sample t output from Crunchit! for 
Export * Exercise 21.8. 


Null hypothesis: Difference of means = 0 


Alternative hypothesis: Difference of means is not 0 


Seo] saree Nean|n 
[este [080 
F [seo faa 
df: 


10.68 
Difference of means: -6.786 


t statistic: -2.177 


P-value: 0.05281 


The two-sample t procedures are more robust (less sensitive to departures from 
our conditions for inference for comparing two means) than the one-sample t meth- 
ods, particularly when the distributions are not symmetric. When the sizes of the 
two samples are equal and the two populations being compared have distributions 
with similar shapes, probability values from the t table are quite accurate for a broad 
range of distributions when the sample sizes are as small as ny = n) = 5.6 When 
the two population distributions have different shapes, larger samples are needed. 

As a guide to practice, adapt the guidelines given on page 470 for the use of one- 
sample t procedures to two-sample procedures by replacing “sample size” with the 

“sum of the sample sizes,” n; + n;. These guidelines err on the side of safety, 
A especially when the two samples are of equal size. In planning a two-sample 

study, choose equal sample sizes whenever possible. The two-sample t procedures 
are most robust against non-Normality in this case, and the conservative Option 2 prob- 
ability values are most accurate. 


(4) LaunchPad Online Resources 


* The StatBoards video, Robustness and Determining If t Procedures Are 
Appropriate, discusses some examples. 


Apply Your Knowledge 


21.9 Do Good Smells Bring Good Business? Businesses know that cus- 
tomers often respond to background music. Do they also respond to odors? 
One study of this question took place in a small pizza restaurant in France 
on two Saturday evenings in May. On one of these evenings, a relaxing 
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lavender odor was spread through the restaurant. On the other evening, 
no scent was used. Table 21.2 gives the time (in minutes) that two samples 
of 30 customers spent in the restaurant and the amount they spent (in 
euros).’ The two evenings were comparable in many ways (weather, cus- 
tomer count, and so on), so we are willing to regard the data as indepen- 
dent SRSs from spring Saturday evenings at this restaurant. The authors 
say, “Therefore at this stage it would be impossible to generalize the results 
to other restaurants.” £j)) opors2 


eS 
= 


NO ODOR LAVENDER 
MINUTES EUROS SPENT MINUTES EUROS SPENT 
103 15.9 92 21.9 
68 18.5 126 18.5 
79 15.9 114 22.3 
106 18.5 106 21.9 
72 18.5 89 18.5 
121 DIL) 137 DAD 
92 15.9 93 18.5 
84 15.9 76 TRS 
72 15.9 98 21.5 
92 15.9 108 DiS 
85 15.9 124 215 
69 18.5 105 18.5 
73 18.5 129 25.5 
87 18.5 103 18.5 
109 20.5 107 18.5 
115 18.5 109 JAS) 
91 18.5 94 18.5 
84 15.9 105 18.5 
76 15.9 102 24.9 
96 1529 108 BAS) 
107 18.5 95 25.9 
98 SED) 121 21.9 
92 15.9 109 18.5 
107 18.5 104 18.5 
93 15.9 116 22.8 
118 18.5 88 18.5 
87 15.9 109 21.9 
101 259 97 20.7 
75 12.9 101 21.9 
86 15.9 106 2225 


(a) Does a lavender odor encourage customers to stay longer in the 
restaurant? Examine the time data and explain why they are suitable 
for two-sample t procedures. Use the two-sample t test to answer the 
question posed. 


21.6 Details of the t Approximation 


(b) Does a lavender odor encourage customers to spend more while in 
the restaurant? Examine the spending data. In what ways do these 
data deviate from Normality? Explain why, with 30 observations, the 
t procedures are reasonably accurate for these data. Use the two- 
sample t test to answer the question posed. 

21.10 Compressing Soil. Farmers know that driving heavy equipment on wet 
soil compresses the soil and injures future crops. Here are data on the “pen- 
etrability” of the same type of soil at two levels of compression.’ Penetrabil- 
ity is a measure of how much resistance plant roots will meet when they try 
to grow through the soil. am SOILCM 


Compressed Soil 


Mists) sols) SYD ASV) Diy fos as} 30S} PE sho) 
30s} Ais Ts} ASKS} BO) eS): AS I) BIN} Sul 


Intermediate Soil 


3.14 3.38 3.10 3.40 3.38 3.14 3.18 3.26 2.96 3.02 
3.54 3.36 3.18 3.12 3.86 2.92 346 3.44 3.62 4.26 


(a) Make stemplots to investigate the shapes of the distributions. The 
penetrabilities for intermediate soil are skewed to the right and have 
a high outlier. Returning to the source of the data shows that the 
outlying sample had unusually low soil density, so that it belongs in 
the “loose soil” class. We are justified in removing the outlier. 


(b) We suspect that the penetrability of compressed soil is less than that 
of intermediate soil. Do the data (with the outlier removed) support 
this suspicion? 

21.11 Weeds among the Corn. Lamb’s-quarter is a common weed that inter- 
feres with the growth of corn. An agriculture researcher planted corn at 
the same rate in 16 small plots of ground, then weeded the plots by hand 
to allow a fixed number of lamb’s-quarter plants to grow in each meter of 
corn row. No other weeds were allowed to grow. Here are the yields of corn 
(bushels per acre) for only the experimental plots controlled to have one 
weed per meter of row and nine weeds per meter of row:° £j),) wocorN 


One weed/meter 166.2 157.3 166.7 161.1 
Nine weeds/meter 162.8 142.4 162.8 162.4 


Explain carefully why a two-sample t confidence interval for the difference 
in mean yields may not be accurate. 

21.12 Compressing Soil, Continued. Use the data in Exercise 21.10, omitting 
the outlier, to give a 90% confidence interval for the decrease in penetrabil- 
ity of compressed soil relative to intermediate soil. SOILCM2 


=| 
= 


The exact distribution of the two-sample ¢ statistic is not a t distribution. 
Moreover, the distribution changes as the unknown population standard deviations 
go and a, change. However, an excellent approximation is available. We call this 
Option 1 for t procedures. 


*This section can be omitted unless you are using software and wish to understand what the software 
does. 
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Approximate Distribution of the 


Two-Sample t Statistic 


The distribution of the two-sample t statistic is very close to the t distribution with 
degrees of freedom df given by 


1 (=) 1 (: ) 
+ 
iy, = 1 Ny ey 1 nN? 


This approximation is accurate when both sample sizes n, and n; are 5 or larger. 


df = 


In the experiment of Examples 21.2 and 21.3, the data on minutes per day spent 
standing and walking give 
— | Group n x s 


Group 1 (lean) 10 525.751 107.121 
Group 2 (obese) 10 373.269 67.498 


The two-sample t test statistic calculated from these values is t = 3.808. 

The one-sided P-value is the area to the right of 3.808 under a t density curve, as 
in Figure 21.2. The conservative Option 2 uses the t distribution with nine degrees 
of freedom. Option 1 finds a very accurate P-value by using the t distribution with 
degrees of freedom df given by 


Gen n ney 
10 10 
4 os aa (S24se"y 
9 10 9\ 10 
_ 2,569,894 
~ 169,367.2 


df 


= 15.1735 


These degrees of freedom appear in the graphing calculator output in Figure 21.4. 
Because the formula is messy and roundoff errors are likely, we don’t recommend 
calculating df by hand. = 


The degrees of freedom df is generally not a whole number. It is always at least as large 
as the smaller of n,; — 1 and n; — 1. The larger degrees of freedom that result from 
Option 1 give slightly shorter confidence intervals and slightly smaller P-values than the 
conservative Option 2 produces. There is a t distribution for any positive degrees of free- 
dom, even though Table C contains entries only for whole-number degrees of freedom. 

The difference between the t procedures using Options | and 2 is rarely of practi- 
cal importance. That is why we recommend the simpler, conservative Option 2 for 
inference without software. With software, the more accurate Option | procedures 
are painless. 


Apply Your Knowledge 


21.13 Friends Shrink Foes. Ina study of the perceived formidability of a picture 
of a possible foe (said to be a terrorist), adult males were presented with the 
picture and asked to rate the formidability of the foe. Thirty-seven subjects 


21.8 Avoid Inference about Standard Deviations 


were with friends when asked to provide the rating, while 21 were alone. Is 
the perceived formidability less in the presence of friends than when alone? 
Larger scores indicate a greater perceived formidability.'° Software that uses 
Option 1 gives these summary results: 


Mean Std dev Std err = dt Pp 
0.68 LS geal byl 2,255 35.3 =0.02 


0.83 0.18 


Situation n 
Alone BY 0) oA) 


Wath friends: 20) 0-19 


Starting from the sample means and standard deviations, verify each of 
these entries: the standard errors of the means; the degrees of freedom for 
the two-sample t; the value of t. 

21.14 Perception of Life Expectancy. Figure 21.5 gives output for the life ex- 
pectancy data in Exercise 21.8 from software that does Option 1 with the 
correct degrees of freedom. What are x; and s; for the two samples? Starting 
from these values, find the t test statistic and its degrees of freedom. Your 
work should agree with Figure 21.5. 

21.15 Friends Shrink Foes, Continued. Write a sentence or two summa- 
rizing the comparison of males alone and males with friends in Exercise 


21.13, as if you were preparing a report for publication. Use the output in 
Exercise 21.13. 


Most software and graphing calculators, including all four illustrated in Figure 21.4, 
offer a choice of two-sample t statistics. One is often labeled for “unequal” vari- 
ances, and the other for “equal” variances. The “unequal” variance procedure is 
our two-sample t. This test is valid whether or not the population variances are equal. 
The other choice is a special version of the two-sample t statistic that assumes that 
the two populations have the same variance. This procedure averages (the statisti- 
cal term is “pools”) the two sample variances to estimate the common population 
variance. The resulting statistic is called the pooled two-sample t statistic. It is equal 
to our ¢ statistic if the two sample sizes are the same, but not otherwise. We could 
choose to use the pooled t for tests and confidence intervals. 

The pooled t statistic has exactly the t distribution with n, + n) — 2 degrees 
of freedom if the two population variances really are equal and the population 
distributions are exactly Normal. The pooled t was in common use before software 
made it easy to use Option 1 for our two-sample t statistic. Of course, in the real 
world distributions are not exactly Normal and population variances are not exactly 

equal. In practice, the Option 1 two-sample t procedures are almost always 
A more accurate than the pooled procedures. Our advice: Never use the pooled 
t procedures if you have software that will implement Option 1. 


Two basic features of a distribution are its center and variability. In a Normal popu- 
lation, we measure center by the mean and variability by the standard deviation. We 
use the t procedures for inference about population means for Normal populations, 


*This short section offers advice on what not to do. This material is not needed to understand the 
rest of the book. 
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| IS MONEY THE 


ROOT OF ALL EVIL? 


That may go too far, but Kathleen 
Vohs and her co-workers show 

that even thinking about money 
has strong effects. Exercise 21.47 
describes a small part of their work. 
What does Professor Vohs say about 
the consequences of having money? 
“Money makes people feel self- 
sufficient and behave accordingly.” 
With money, you can achieve your 
goals with less help from others. 
You feel less dependent on others 
and more willing to work toward 
your own goals. Maybe that’s good. 
You also prefer to be less involved 
with others, so that self-sufficiency 
is a barrier to close relationships 
with others. Maybe that’s not good. 
Scientists don’t tell us what’s good or 
not good, just that money increases 
our sense of self-sufficiency. 
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and we know that t procedures are widely useful for non-Normal populations as well. 
It is natural to turn next to inference about the standard deviations of Normal popu- 
lations. Our advice here is short and clear: don’t do it without expert advice. 

There are methods for inference about the standard deviations of Normal popu- 
lations. The most common such method is the “F test” for comparing the standard 
deviations of two Normal populations. You will find this test in the menus of most 

statistical software. Unlike the t procedures for means, the F test for standard 
A deviations is extremely sensitive to non-Normal distributions. This lack of ro- 

bustness does not improve in large samples. It is difficult in practice to tell 
whether a significant test result is evidence of unequal population variability or 
simply a sign that the populations are not Normal. Because this test is of little use 
in practice, we don’t give its details. 

The deeper difficulty underlying the very poor robustness of Normal population 
procedures for inference about variability already appeared in our work on describ- 
ing data. The standard deviation is a natural measure of variability for Normal dis- 
tributions but not for distributions in general. In fact, because skewed distributions 
have asymmetric tails, no single numerical measure does a good job of describing 
the variability of a skewed distribution. In summary, the standard deviation is 
not always a useful parameter, and even when it is (for symmetric distributions), 

the results of inference about the standard deviation are not trustworthy. 
A Consequently, we do not recommend trying to make inferences about population 
standard deviations in basic statistical practice." 


To determine whether the effect of a new treatment differs from a standard treat- 
ment or a placebo, we often test the null hypothesis of no treatment effect. When 
units are assigned to treatments completely at random, this fact allows us to test 
this hypothesis. This test does not require the assumption that the data come from 
a Normal population. The reasoning was presented in Example 12.8 (page 287) and 
Example 15.11 (page 363) and we revisit it here. 


We have six subjects for an experiment: Ari, Luis, Troy, Ana, Deb, and Hui. Three are 
to be assigned completely at random to a new experimental weight loss treatment 
and three to a placebo to see if the experimental treatment results in greater weight 
loss than a placebo. Thus, we test the hypotheses 


Ho: no treatment effect 
H,: new treatment results in a greater mean weight loss than a placebo 


We will compare mean weight losses for the treatment and placebo groups to test 
these hypotheses. 

Suppose Luis, Deb, and Hui are assigned to the experimental treatment. Here are 
the weight losses after six weeks. 


Treatment Group (Person) Placebo Group (Person) 
15 (Luis) 2 (Ari) 
12 (Deb) 8 (Troy) 
9 (Hui) 1 (Ana) 


*You may prefer to skip this material if you have not discussed resampling methods. 


The mean weight loss for the treatment group is 2*43***® = 12.00, and the 
mean weight loss for the placebo group is **§*+ = 3.67. The difference in mean 
weight losses (treatment group weight loss minus placebo group weight loss) is 
12.00 — 3.67 = 8.33. 

If the null hypothesis of no treatment effect is true, then Bob was going to 
lose 15 pounds whether he was assigned to the treatment group or the placebo 
group, and similarly for the other subjects. This fact allows us to compute a P-value 
using only the random assignment of the subjects to the two groups. Here is the 
reasoning. 

If the subjects are assigned to the two treatments completely at random, then 
all possible assignments of subjects to treatments are equally likely. There are 
20 possible ways of assigning three subjects to the treatment group and three 
to the placebo group, hence each has probability 1/20 of occurring. If the null 
hypothesis of no treatment effect is true, then the weight loss we observe for any 
subject will be the same regardless of which treatment the subject receives. Thus, 
the responses we actually observe for each treatment are due only to the “luck of 
the draw,” namely the particular assignment of subjects to treatments that our ran- 
domization produced. For each possible assignment, we can list the weight losses 
we would observe with each treatment and compute the difference in mean weight 
losses. We did this in Example 15.11, and we display the results again here. These 
are all equally likely. 


Difference in 


Treatment Group Placebo Group Mean Weight Loss 


Ari, Luis, Troy Ana, Deb, Hui 1.00 
Ari, Luis, Ana Troy, Deb, Hui =3.67/ 
Ari, Luis, Deb Troy, Ana, Hui 3.67 
Ari, Luis, Hui Troy, Ana, Deb 1.67 
Ari, Troy, Ana Luis, Deb, Hui —8.33 
Ari, Troy, Deb Luis, Ana, Hui —1.00 
Ari, Troy, Hui Luis, Ana, Deb —3.00 
Ari, Ana, Deb Luis, Troy, Hui =6.33 
Ari, Ana, Hui Luis, Troy, Deb —7.67 
Ari, Deb, Hui Luis, Troy, Ana =0,33! 
Luis, Troy, Ana Ari, Deb, Hui 0.33 
Luis, Troy, Deb Ari, Ana, Hui 7.67 

Luis, Troy, Hui Ari, Ana, Deb 6.33 
Luis, Ana, Deb Ari, Troy, Hui 3.00 
Luis, Ana, Hui Ari, Troy, Deb 1.00 
Luis, Deb, Hui Ari, Troy, Ana 8.33 
Troy, Ana, Deb Ari, Luis, Hui —1.67 
Troy, Ana, Hui Ari, Luis, Deb a OLOl 
Troy, Deb, Hui Ari, Luis, Ana 3.67 
Ana, Deb, Hui Ari, Luis, Troy —1.00 


Ordering the differences in mean weight losses from low to high, recalling each as- 
signment of subjects to the two groups has probability 1/20 = 0.05, and combining 
duplicates, we find, as in Example 15.11: 
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Weight Loss 8.33 7.67 6.33 3.67 3.00 1.67 1.00 0.33 
Probability 0.05 0.05 0.05 0.10 0.05 0.05 0.10 0.05 
Weight Loss 0.33 1.00 1.67 3.00 3.67 6.33 7.67 8.33 
Probability 0.05 0.10 0.05 0.05 0.10 0.05 0.05 0.05 


The alternative hypothesis is that the new treatment will produce a greater mean 
weight loss than the placebo, hence the P-value (the probability of observing a 
weight loss as large as or larger than 8.33 if the null hypothesis is true) for the 
experiment is 0.05. 

You can check that if we use the two-sample t procedure with Option 1 to test 
the hypotheses 


Ho: bi = Me 
Hai bi > be 


the P-value is 0.0216 for these data. m 


In general, if experimental units are assigned to two treatment groups completely 
at random, and our null hypothesis is “no treatment effect,” we can test hypotheses 
using sample means as follows. First, list all possible ways units can be assigned to 
treatment groups.” Second, based on the data obtained, for each possible assign- 
ment determine what the difference in means (mean for treatment | minus mean 
for treatment 2) would be. Under the null hypothesis, each of these is equally likely. 
Third, determine the distribution of these possible outcomes by listing all the dif- 
ferent possible mean differences and their corresponding probabilities (this can also 
be represented by a histogram). Using this sampling distribution, determine the 
P-value of the actual mean difference you obtained. 

This procedure is sometimes referred to as a permutation test and the sampling 
distribution is referred to as the permutation distribution. It assumes units are 
assigned to treatments completely at random and does not require the assumption 
of Normality. However, there are some practical issues. First, unless the number of 
ways units can be assigned to treatments is sufficiently large, P-values may not be 
small. In Example 21.7, the smallest possible P-value is 0.05 (and 0.10 for a two- 
sided alternative). This is because with only 20 possible assignments, the largest 
and smallest possible outcomes have probability at least 0.05. If we can assume 
Normality and use the two-sample t procedure, we can get more accurate P-values. 

Second, listing all possible outcomes can be quite tedious unless you have special 
software. There are 70 ways eight units can be divided into two groups of size 4, 
252 ways 10 units can be divided into two groups of size 5, 924 ways 12 units can 
be divided into two groups of size 6, and 184,756 ways 20 units can be divided into 
two groups of size 10. When the number of units is large, one could use resampling 
to estimate the sampling distribution. For example, if we have 10 units assigned 
to each of two treatments, rather than listing all 184,756 possible assignments, we 
could take 1000 samples of size 10 from the 20 units and determine the difference 
in means we would observe for each of these 1000 samples. 

Third, permutation tests are most likely to be useful with small to moderate size 
experiments because of the robustness of t procedures. In small to moderate size 
experiments, if there is good reason to assume the sampling distribution of the two- 
sample t statistic has approximately a t distribution, the two-sample t procedure may 
be preferable to a permutation test. 
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(9) LaunchPad Online Resources 
* The StatBoards video, Permutation Tests, discusses additional examples. 


* The Snapshots video, Resampling Procedures, includes some discussion of 
permutation tests. 


Apply Your Knowledge 


21.16 A Very Simple Setting. Does a zinc supplement taken at the onset of 
cold symptoms reduce the time one has a cold? Five volunteers (Dan, Hal, 
Joe, Kay, and Zoe) agree to take part in an experiment. Three are assigned 
completely at random to receive the zinc supplement and the other two 
receive the placebo. The experiment is double-blind. The results are (times 
are duration of cold in hours): 


Zinc Group (Person) Placebo Group (Person) 
38 (Dan) 44 (Hal) 
31 (Joe) 47 (Zoe) 
35 (Kay) 


(a) There are 10 possible ways the five subjects can be assigned to the 
two groups, with the zinc group having size 3 and the placebo group 
size 2. List these. 


(b) For each, determine the difference in mean length of cold (mean 
for the placebo group minus mean for the zinc group). Combine any 
duplicates and make a table of the possible mean differences and the 
corresponding probability of each under the null hypothesis of no 
treatment effect. (Each of the 10 possible assignments of subjects to 
treatments has probability 1/10 under the null hypothesis.) This is 
the permutation distribution. 


(c) Compute the P-value of the data. Assume the alternative hypothesis 
is that the mean duration of a cold is less for zinc. 

(d) In this example, is it possible to demonstrate significance at the 
5% level using the permutation test? Explain. 


(e) Assume that cold duration is Normally distributed for both zinc and 
the placebo. Use the two-sample t procedure to test the hypotheses. 
Use Option 1 if you have access to software. 


21.17 Weeds among the Corn. Exercise 21.11 discusses a completely random- 
ized experiment. Two treatments—one weed per meter and nine weeds per 
meter—were assigned completely at random to eight plots of land. Each was 
assigned to four plots. Here are the yields of corn (bushels per acre) :¥ 


One weed/meter 166.2 157.3 166.7 161.1 
Nine weeds/meter 162.8 142.4 162.8 162.4 


In Exercise 21.11, you decided that two-sample t procedures might not be 
accurate. However, the permutation test can be used. Figure 21.6 displays 
the distribution of all possible mean differences for the 70 different ways 
treatments could be assigned to plots. Use Figure 21.6 to estimate the P-value 
of these data. 


506 CHAPTER 21 & Comparing Two Means 


FIGURE 21.6 
Permutation distribution for 
Exercise 21.17. 
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CHAPTER 21 SUMMARY 


Chapter Specifics 


The data in a two-sample problem are two independent SRSs, each drawn from a 
separate population. 


Tests and confidence intervals for the difference between the means p1, and pL, of two 
Normal populations start from the difference x; — x, between the two sample means. 
Because of the central limit theorem, the resulting procedures are approximately correct 
for other population distributions when the sample sizes are large. 


Draw independent SRSs of sizes n, and n, from two Normal populations with parameters 
M1, 01, and fy, 0). The two-sample t statistic is 


_ (= %) — (er = bn) 
aa 
Ny nN? 


The statistic t has approximately a t distribution. 


There are two choices for the degrees of freedom of the two-sample t statistic. Option 1: 
software produces accurate probability values using degrees of freedom calculated from 
the data. Option 2: for conservative inference procedures, use degrees of freedom equal 
to the smaller of n; — 1 and n, — 1. 


The confidence interval for w, — p22 is 
(x — X)) + t* 


The critical value t* from Option 1 gives a confidence level very close to the desired level 
C. Option 2 produces a margin of error at least as wide as is needed for the desired level C. 


Significance tests for Ho: ww, = fz are based on 


P-values calculated from Option 1 are very accurate. Option 2 P-values are always at 
least as large as the true P. 


m The two-sample t procedures are quite robust against departures from Normality. 
Guidelines for practical use are similar to those for one-sample t procedures. Equal 
sample sizes are recommended. 


m Procedures for inference about the standard deviations of Normal populations are very 
sensitive to departures from Normality. Avoid inference about standard deviations 
unless you have expert advice. 


m= When experimental units are assigned to treatments completely at random, one can 
use permutation tests to test the hypothesis of no treatment effect. This test does not 
require the assumption that the data come from Normal populations. 


Link It 


In Chapter 20, we studied inference for the mean of a Normal population using proce- 
dures based on the t distribution. These procedures are more realistic than those studied 
in Chapters 15 to 17 because t procedures do not require that we know the population 
variance. In practice, the most common use of t procedures for a single population mean is 
with matched pairs data because most research studies make comparisons between two or 
more populations. 

In this chapter, we discuss t procedures for comparing the means of two Normal popu- 
lations when we have independent samples from these two populations. These t proce- 
dures are quite common in practice, and one encounters them in many research papers. 
Researchers typically report the means and standard deviations for the two samples and, in 
the case of tests of hypotheses, the value of the t statistic and the corresponding P-value. 

In the next two chapters, we extend our procedures for inference to additional settings 
that occur frequently in practice—in particular, to inference for population proportions. 
And we will see that the basic ideas about confidence intervals and tests of hypotheses that 
we learned in Chapters 15 to 17 still apply. 


() LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


* StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. You may find this helpful if you need 
additional assistance understanding any of the material in this chapter. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 
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| CHECK YOUR SKILLS 


21.18 


21.19 


21.20 


21.21 


21.22 


The 2113 National Assessment of Educational 
Progress (NAEP) gave a mathematics test to a ran- 
dom sample of twelfth-graders in Michigan. The 
mean score was 154 out of 300. To give a confidence 
interval for the mean score of all Michigan twelfth- 
graders, you would use 


(a) the two-sample t interval. 
(b) the matched pairs t interval. 
(c) the one-sample t interval. 


In the 2013 NAEP sample of Michigan twelfth- 
graders, the mean mathematics scores were 152 for 
female students and 156 for male students. To see if 
this difference is statistically significant, you would use 


(a) the two-sample t test. 
(b) the matched pairs t test. 
(c) the one-sample t test. 


Two new devices for testing blood sugar levels have 
been developed. How do these devices compare? 
You test blood sugar levels of 20 diabetics with both 
devices and use 


(a) the two-sample t test. 
(b) the matched pairs t test. 
(c) the one-sample t test. 


One major reason that the two-sample t procedures 
are widely used is that they are quite robust. This 
means that 


(a) t procedures do not require that we know the 
standard deviations of the populations. 

(b) confidence levels and P-values from the t pro- 
cedures are quite accurate even if the population 
distribution is not exactly Normal. 

(c) t procedures compare population means, a com- 
parison that answers many practical questions. 


A study of the effect of exposure to color (red or blue) 
on the ability to solve puzzles used 42 subjects. Half 
the subjects (21) were asked to solve a series of puz- 
zles while in a red-colored environment. The other 
half were asked to solve the same series of puzzles 
while in a blue-colored environment. The time taken 
to solve the puzzles was recorded for each subject. 
To compare the mean times for the two groups of 


21.23 


21.24 


21.25 


21.26 


subjects using the two-sample t procedures with the 
conservative Option 2, the correct degrees of free- 
dom is 

(a) 41. (b) 40. (c) 20. 

The 21 subjects in the red-colored environment had a 
mean time for solving the puzzles of 9.64 seconds with 
standard deviation 3.43; the 21 subjects in the blue- 
colored environment had a mean time of 15.84 sec- 
onds with standard deviation 8.65. The two-sample t 
statistic for comparing the population means has value 


(a) 1.50. (b) 3.05. (c) 6.20. 


A study of the use of social media asked a sample of 
488 American adults under the age of 40 and a sam- 
ple of 421 American adults aged 40 or over about 
their use of social media. Based on their answers, 
each subject was assigned a social media usage score 
on a scale of 0 to 25. Higher scores indicate greater 
usage. The subjects were chosen by random digit di- 
aling of telephone numbers. Are the conditions for 
two-sample t inference satisfied? 


(a) Maybe: the SRS condition is OK but we need to 
look at the data to check Normality. 

(b) No: scores in a range between 0 and 25 can’t be 
Normal. 

(c) Yes: the SRS condition is OK and large sample 
sizes make the Normality condition unnecessary. 


We suspect that younger adults use social media 
more than adults aged 40 or over. To see if this is 
true, test these hypotheses for the mean social media 
usage scores of all adults under 40 and all adults 40 
and over: 


(a) Ho: W<4o = Me4o Versus Hy: Wes > L=a40 

(b) Ho: ego = Mego versus Hy: Wego A Me4o 

(c) Ho: W<4o = Meo Versus Hy: egy < Paap 

The two-sample t statistic for the social media use 
study (“under 40” mean minus “40 and over” mean) 
is t = 3.18. The P-value for testing the hypotheses 
from the previous exercise satisfies 


(a) 0.001 < P < 0.005. 
(b) 0.0005 < P < 0.001. 
(c) 0.001 < P < 0,002. 


| CHAPTER 21 EXERCISES 


Exercises 21.25 to 21.34 are based on summary statistics 
rather than raw data. This information is typically all that 
is presented in published reports. You can perform inference 
procedures by hand from the summaries. Use the conserva- 
tive Option 2 (degrees of freedom the smaller of n; — 1 and 
n, — 1) for two-sample t confidence intervals and P-values. 
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You must trust that the authors understood the condi- 
tions for inference and verified that they apply. This 
isn’t always true. 

Do women talk more than men? Equip male and 
female students with a small device that secretly 
records sound for a random 30 seconds during each 


12.5-minute period over two days. Count the words 21.29 
each subject speaks during each recording period, 
and from this, estimate how many words per day each 
subject speaks. The published report includes a table 


summarizing six such studies.'* Here are two of the six: 


STUDY 


ESTIMATED AVERAGE 
NUMBER (SD) OF WORDS 
SPOKEN PER DAY 
WOMEN MEN 


SAMPLE SIZE 
WOMEN MEN 


1 56 56 16,177 (7520) 16,569 (9108) 


2 27 20 16,496 (7914) 12,867 (8343) 
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Readers are supposed to understand that, for exam- 
ple, the 56 women in the first study had x = 16,177 
and s = 7520. It is commonly thought that women 
talk more than men. Does either of the two samples 
support this idea? For each study: 


(a) State hypotheses in terms of the population 
means for men (jy) and women (Lf). 

(b) Find the two-sample t statistic. 

(c) What degrees of freedom does Option 2 use to 
get a conservative P-value? 

(d) Compare your value of t with the critical values 
in Table C. What can you say about the P-value of 
the test? 

(e) What do you conclude from the results of these 
two studies? 


Alcohol and zoning out. Healthy men aged 21 to 
35 were randomly assigned to one of two groups: 
half received 0.82 grams of alcohol per kilogram of 
body weight; half received a placebo. Participants 
were then given 30 minutes to read up to 34 pages of 
Tolstoy’s War and Peace (beginning at Chapter 1, 
with each page containing approximately 22 lines of 
text). Every two to four minutes participants were 
prompted to indicate whether they were “zoning 
out.” The proportion of times participants indicated 
they were zoning out was recorded for each subject. 
The table below summarizes data on the proportion 
of episodes of zoning out. (The study report gave 
the standard error of the mean s/ Vn, abbreviated as 
SEM, rather than the standard deviation s.) 
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GROUP n x SEM 
Alcohol 5) 0.25 0.05 
Placebo aD) 0.12 0.03 


(a) What are the two sample standard deviations? 
(b) What degrees of freedom does the conservative 
Option 2 use for two-sample t procedures for these 
samples? 

(c) Using Option 2, give a 90% confidence interval 
for the mean difference between the two groups. 


Exercises 
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Stress and weight in rats. In a study of the effects 
of stress on behavior in rats, 71 rats were randomly 
assigned to either a stressful environment or a con- 
trol (nonstressful) environment. After 21 days, the 
change in weight (in grams) was determined for 
each rat. The table below summarizes data on weight 
gain.'© (The study report gave the standard error of 
the mean s/Vn , abbreviated as SEM, rather than 
the standard deviation s.) 


GROUP n x SEM 
Stress 20 26 3 
No stress Dil Be 2 


Voisin/Phanie/Science Source 


(a) What are the standard deviations for the two 
groups? 

(b) What degrees of freedom does the conservative 
Option 2 use for two-sample t procedures for these 
data? 

(c) Test the null hypothesis of no difference between 
the two group means against the two-sided alterna- 
tive. Use the degrees of freedom from part (b). 


Is Montessori preschool beneficial? Do educa- 
tion programs for preschool children that follow the 
Montessori method perform better than other pro- 
grams? A study compared five-year-old children in 
Milwaukee, Wisconsin, who had been enrolled in 
preschool programs from the age of three.!” 


(a) Explain why comparing children whose parents 
chose a Montessori school with children of other 
parents would not show whether Montessori schools 
perform better than other programs. (In fact, all 

the children in the study applied to the Montes- 

sori school. The school district assigned students to 
Montessori or other preschools by a random lottery.) 
(b) In all, 54 children were assigned to the Montes- 
sori school and 112 to other schools at age three. 
When the children were five, parents of 30 of the 
Montessori children and 25 of the others could be 
located and agreed to and subsequently participated 
in testing. This information reveals a possible source 
of bias in the comparison of outcomes. Explain why. 
(c) One of the many response variables was score 
on a test of ability to apply basic mathematics to 
solve problems. Here are summaries for the children 
who took this test: 
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GROUP n x Ss 
Montessori 30 19 SUL 
Control 25 iG 4.19 


Is there evidence of a difference in the population 
mean scores? (The researchers used two-sided alter- 
native hypotheses.) 


Is job satisfaction going down? In the last 10 years, 
several authors have stated that people are miserable 
in their jobs and have become increasingly unhappy 
over time. Job satisfaction scores from participants in 
the General Social Survey (GSS) in 1975 and 2006 
were studied. Higher scores indicate greater satisfac- 
tion, with 2.5 being a neutral score. Here are sum- 


maries of scores for 1975 and 2006.'® 


YEAR 
1975 
2006 


SAMPLE SIZE 
1165 
DAE 


MEAN 
Brot 
Bro, 


STD. DEV. 
0.81 
0.80 


(a) Is there a significant decrease in mean scores 
from 1975 to 2006? What do these data show about 
job satisfaction in 1975 compared to 2006? 

(b) The paper from which the data came includes 
several years from 1972 to 2006. 1975 was the year 
with the highest mean score. What, if any, effect 
does this information have on your assessment of 
whether job satisfaction has decreased over time? 


Illusory pattern perceptions. When one experi- 
ences a lack of control in one’s life, does one compen- 
sate by seeking structure elsewhere? Assign 36 under- 
graduate students to one of two conditions. All are 
asked to identify a concept associated with a series 
of “grainy” pictures that contain an embedded image 
and are presented to them sequentially. During the 
task, they can ask questions to help them determine 
the associated concept. Half of the subjects (the lack- 
of-control group) receive feedback that is random 
and noncontingent on their questions. The other 
half receive useful feedback (the in-control group). 
After attempting to complete the task, all subjects 
are presented with 12 grainy pictures that are similar 
to those used in the task but lack an embedded im- 
age. All are asked whether they perceive an image 
in the pictures, and the number of pictures identified 
as containing an image is counted for each subject. 
Here are the summary statistics:!” 


GROUP SIZE MEAN STD. DEV. 


Lack-of-control 18 


5.16 3.5 


In-control 18 


Bail 2.0 


(a) What degrees of freedom would you use in the 
conservative two-sample t procedures to compare 
the lack-of-control and in-control groups? 
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(b) What is the two-sample t test statistic for com- 
paring the mean number of pictures identified as 
having an image for the two groups? 

(c) Test the null hypothesis of no difference 
between the two population means against the two- 
sided alternative. Use your statistic from part (b) 
with degrees of freedom from part (a). 


Concussions and brain size. What is the effect of 
concussions on the brain? Researchers measured the 
brain sizes (hippocampal volume in microliters) of 
25 collegiate football players with a history of clini- 
cian-diagnosed concussion and 25 collegiate football 
players without a history of concussion. Here are the 
summary statistics:7° 


GROUP GROUP SIZE MEAN STDDEV 
Concussion 25 5784 609.3 
Non-concussion 25 6489 815.4 
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(a) Is there evidence of a difference in mean brain 
size between football players with a history of con- 
cussion and those without concussions? 

(b) The researchers in this study stated that partici- 
pants were “consecutive cases of healthy National 
Collegiate Athletic Association Football Bowl Sub- 
division Division I football athletes with (n = 25) 
or without (n = 25) a history of clinician-diagnosed 
concussion .. . between June 2011 and August 2013” 
at a U.S. psychiatric research institute specializing 
in neuroimaging among collegiate football players. 
What effect does this information have on your 
conclusions in part (a)? 


Coaching and SAT scores. Coaching companies 
claim that their courses can raise the SAT scores of 
high school students. Of course, students who retake 
the SAT without paying for coaching generally raise 
their scores, too. A random sample of students who 
took the SAT twice found 427 who were coached 
and 2733 who were uncoached.”! Starting with their 
Verbal scores on the first and second tries, we have 
these summary statistics: 


TRY 1 TRY 2 GAIN 
n x s x s x s 
Coached 427 500 92 529 97 29 59 
Uncoached 2733 506 101 527 ~~ 101 Mil 52 


The summary statistics for Gain are based on the 
changes in the scores of the individual students. 
Let’s first ask if students who are coached increased 
their scores significantly. 


(a) You could use the information on the Coached line 
to carry out either a two-sample t test comparing Try 1 
with Try 2 for coached students, or a matched pairs t 
test using Gain. Which is the correct test? Why? 


(b) Carry out the proper test. What do you conclude? 
(c) Give a 99% confidence interval for the mean 
gain of all students who are coached. 


21.35 Coaching and SAT scores, continued. What 
we really want to know is whether coached stu- 
dents improve more than uncoached students, and 
whether any advantage is large enough to be worth 
paying for. Use the information in the previous exer- 
cise to answer these questions. 


(a) Is there good evidence that coached students 
gained more on the average than uncoached students? 
(b) How much more do coached students gain on 
the average? Give a 99% confidence interval. 

(c) Based on your work, what is your opinion: do 
you think coaching courses are worth paying for? 


21.36 Coaching and SAT scores: critique. The data you 
used in the previous two problems came from a ran- 
dom sample of students who took the SAT twice. The 
response rate was 63%, which is pretty good for nongov- 
ernment surveys, so let’s accept that the respondents do 
represent all students who took the exam twice. None- 
theless, we can’t be sure that coaching actually caused 
the coached students to gain more than the uncoached 
students. Explain briefly but clearly why this is so. 


Exercises 21.37 to 21.44 include the actual data. To apply the 
two-sample t procedures, use Option 1 if you have technology that 
implements that method. Otherwise, use Option 2. 


21.37 Improving your tips. Researchers gave 40 index 
cards to a waitress at an Italian restaurant in New 
Jersey. Before delivering the bill to each customer, the 
waitress randomly selected a card and wrote on the bill 
the same message that was printed on the index card. 
Twenty of the cards had the message, “The weather is 
supposed to be really good tomorrow. I hope you enjoy 
the day!” Another 20 cards contained the message, 
“The weather is supposed to be not so good tomorrow. 
I hope you enjoy the day anyway!” After the customers 
left, the waitress recorded the amount of the tip (per- 
cent of bill) before taxes. Here are the tips for those 
receiving the good-weather message:” {j).) TIP4 


Ae} Ish f USES) Obes AAW) sie ies MAS) if, 710),3) 
DES) Mba PMO) BObay Pg) HO) Deh EM A) HILT 


The tips for the 20 customers who received the bad 
weather message are 


18.0 19.1 19.2 18.8 18.4 19.0 18.5 16.1 16.8 14.0 
17.0 13.6 17.5 20.0 20.2 18.8 18.0 23.2 18.2 19.4 


(a) Make stemplots or histograms of both sets of 
data. Because the distributions are reasonably sym- 
metric with no extreme outliers, the t procedures 
will work well. 

(b) Is there good evidence that the two differ- 

ent messages produce different percent tips? State 


21.38 


21.39 


21.40 
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hypotheses, carry out a two-sample t test, and report 
your conclusions. 


Do good smells bring good business? In Exercise 21.9, 
you examined the effects of a lavender odor on cus- 
tomer behavior in a small restaurant. Lavender is 
a relaxing odor. The researchers also looked at the 
effects of lemon, a stimulating odor. The design of 
the study is described in Exercise 21.9. Here are the 
times in minutes that customers spent in the restau- 
rant when no odor was present: ODORS3 


NOS RECS OCR 2 eo S42 92 
8 69 73 87 109 115 91 84 76 96 
LOT MISO NOT SS OS AS Si SOI 755886 


When a lemon odor was present, customers lingered 
for these times: 


78 104 74 75 112 88 105 97 101 89 
'o]s Jo A 0S 6 01s 2 Go Lo 6 0) 
108 GO 965 94 56 90 1S 97 


(a) Examine both samples. Does it appear that 

use of two-sample t procedures is justified? Do the 
sample means suggest that a lemon odor changes 
the average length of stay? 

(b) Does a lemon odor influence the length of time 
customers stay in the restaurant? State hypotheses, 
carry out at test, and report your conclusions. 


Improving your tips, continued. Use the data in 
Exercise 21.37 to give a 95% confidence interval for 
the difference between the mean percent tips for the 
two different messages. 2) 


The power of positive thinking? Does the way the 
press depicts the economic future affect the stock mar- 
ket? To investigate this, researchers analyzed the lon- 
gest article about the crisis from the front page of the 
Money section of USA Today from a randomly chosen 
weekday of each week between August 2007 and June 
2009. Articles were rated as to how positive or negative 
they were about the economic future. For each week, 
the change in the Dow Jones Industrial Average (aver- 
age DJIA value the week after the article appeared 
minus the average the week the article appeared) was 
computed. Positive values of the change indicate that 
the DJIA increased.’ Here are the changes in DJIA 
corresponding to very positive articles: 


35) 700) = 275 1s =25 25 50 
Ds 13) =P) =H 7X00) ZNO) 7/3) 
Here are the changes in DJIA values corresponding 
to very negative articles: 
150 S100) 225 15 SMS =275 375 
= O 125 175 475 
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Is there good evidence that the DJIA performs differ- 
ently after very positive articles than after very nega- 
tive articles? 


(a) Do the sample means suggest that there is a 
difference in the change in the DJIA after very posi- 
tive articles versus after very negative articles? 

(b) Make stemplots for both samples. Are there any 
obvious departures from Normality? 

(c) Test the hypothesis Ho: w; = fz against the 
two-sided alternative. What do you conclude from 
part (a) and from the result of your test? 

(d) Among a host of different factors that are claimed 
to have triggered the economic crisis of 2007-09, 

one was a “culture of irresponsibility” in the way the 
future was depicted in the press. Do the data provide 
any evidence that negative articles in the press con- 
tributed to poor performance of the DJIA? 


What do you make of Mitt? Twenty-nine college 
students, identified as having a positive attitude about 
Mitt Romney as compared to Barack Obama in the 
2012 presidential election, were asked to rate how 
trustworthy the face of Mitt Romney appeared, as rep- 
resented in their mental image of Mitt Romney’s face. 
Ratings were on a scale of 0 to 7, with 0 being “not at 
all trustworthy” and 7 being “extremely trustworthy.” 
Here are the 29 ratings: Si 


MSS 3) Aol sha} Sp BW Se ais) 3h) Anil 
Ady Gis) pel Ap) Se) Sy) Geil SO) SHO) Ae 
Go) i9) BV) 5 y5h Vhs) toy SMO) 3h,5h SIT 


Twenty-nine college students identified as having a 
negative attitude about Mitt Romney as compared 
to Barack Obama in the 2012 presidential election 
were also asked to rate how trustworthy the face of 
Mitt Romney appeared. Here are the 29 ratings: 


TESS 43) 434225 20 Se 4a 23 
Mesh 340) SS) GHEY Gish 5) ahte} Shes) PS) Ike7/ 
8333539 49343 :8 33 384 


(a) Do the sample means suggest that there is a 
difference in the mean trustworthy ratings between 
the two groups? 

(b) Make stemplots for both samples. Are there any 
obvious departures from Normality? 

(c) Test the hypothesis Ho: ww; = pz against the 
one-sided alternative that students with a positive 
attitude rate Mitt Romney more trustworthy than 
those with a negative attitude. What do you con- 
clude from part (a) and from the result of your test? 
How big is the difference? Continue your work from 
Exercise 21.40. A researcher wants to know how big 
a difference there is in the change in the DJIA after 


reports of a positive economic outlook compared to 
reports of a negative economic outlook. Give a 90% 
confidence interval for the difference in mean change 
in the DJIA. 


21.43 Do women talk more than men? Another 
study. Exercise 21.27 described a series of six studies 
investigating the number of words women and men 
speak per day. Exercise 21.27 gives results from two 
of these studies. Here are the results from another of 
these studies. The estimated numbers of words spo- 
ken per day for 27 women are 


15,551 13 O18 O73 26,45) ZS 8 S99 47,63 
25,246 8,427 6,998 24,876 6,272 10,047 15,569 
39,681 23,079 24,814 19,287 10,351 8,866 10,827 
12,584 12,764 19,086 26,852 17,639 16,616 


The estimated numbers of words spoken per day for 
20 men are 


28,408 10,084 15,931 21,688 37,786 10,575 12,880 
11,071 17,799 13,182 8,918 6,495 8,153 7,015 
4,429 10,054 3,998 12,639 10,974 5,255 


Does this study provide good evidence that women 
talk more than men, on average? 


(a) Make stemplots for both samples. Are there any 
obvious deviations from Normality? In spite of these 
deviations from Normality, it is safe to use the t 
procedures. Explain. 

(b) Test the hypothesis Ho: ww, = pz against the one- 
sided alternative that the mean number of words per 
day for women (1;) is greater than the mean number 
of words per day for men (12). What do you conclude? 


Do birds learn to time their breeding? Blue titmice eat cater- 
pillars. The birds would like lots of caterpillars around when they 
have young to feed, but they breed earlier than peak caterpillar 
season. Do the birds time when they breed based on the previ- 
ous year’s caterpillar supply? Researchers randomly assigned seven 
pairs of birds to have the natural caterpillar supply supplemented 
while feeding their young and another six pairs to serve as a control 
group relying on natural food supply. The next year, they measured 
how many days after the caterpillar peak the birds produced their 
nestlings.> Exercises 21.44 to 21.46 are based on this experiment. 


© Hugh Clark/Frank Lane Picture Agency/ 


Corbis 


21.44 


21.45 


21.46 


Did the randomization produce similar groups? The 
first thing to do is to compare the two groups in the first 
year. The only difference should be the chance effect of 
the random assignment. The study report says: “In the 
experimental year, the degree of synchronization did not 
differ between food-supplemented and control females.” 
For this comparison, the report gives t = —1.05. What 
type of t statistic (paired or two-sample) is this? What are 
the degrees of freedom for this statistic? Show that this t 
leads to the quoted conclusion. 


Did the treatment have an effect? The investiga- 
tors expected the control group to adjust their breed- 
ing date the next year, whereas the well-fed supple- 
mented group had no reason to change. The report 
continues: “but in the following year food-supple- 
mented females were more out of synchrony with the 
caterpillar peak than the controls.” Here are the data 


(days behind the caterpillar peak): BREED 


An 23) hall dO) 4 IL? 
Supplemented 15.5 11.3 5.4 16.5 11.3 11.4 7.7 


Control 


Carry out at test and show that it leads to the quoted 
conclusion. 


Year-to-year comparison. Rather than comparing 
the two groups in each year, we could compare the 
behavior of each group in the first and second years. 
The study report says: “Our main prediction was 
that females receiving additional food in the nestling 
period should not change laying date the next year, 
whereas controls, which (in our area) breed too late 
in their first year, were expected to advance their 
laying date in the second year.” 

Comparing days behind the caterpillar peak in 
Years 1 and 2 gave t = 0.63 for the control group 
and t = —2.63 for the supplemented group. Are 
these paired or two-sample t statistics? What are 
the degrees of freedom for each t? Show that these 
t-values do not agree with the prediction. 


The remaining exercises ask you to answer questions from data 
without having the details outlined for you. The exercise statements 
give you the State step of the four-step process. Follow the Plan, 
Solve, and Conclude steps as illustrated in Examples 21.2 and 
21.3 for significance tests and Example 21.4 for confidence inter- 
vals. Remember that examining the data and discussing the condi- 
tions for inference are part of the Solve step. 


21.47 


Thinking about money changes behavior. Kath- 
leen Vohs of the University of Minnesota and her co- 
workers carried out several randomized comparative 
experiments on the effects of thinking about money. 
Here’s part of one such experiment.”° Ask student 
subjects to unscramble 30 sets of five words to make 
a meaningful phrase from four of the five words. The 
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control group unscrambled phrases like “cold it desk 
outside is” into “it is cold outside.” The treatment 
group unscrambled phrases that lead to thinking 
about money, turning “high a salary desk paying” into 
“a high-paying salary.” Then each subject worked a 
hard puzzle, knowing that he or she could ask for help. 
Here are the times in seconds until subjects asked for 
help. For the treatment group: £j))) MNYTHNK 


CODA 44242 OO MITA Se oreo 466m 443 
531 135 241 476 482 362 69 160 


For the control group: 


IS R2 724135294 O04 5 S926 
400 92 64 88 142 141 373 156 


The researchers suspected that money is connected 
with self-sufficiency, so that the treatment group will 
ask for help less quickly on the average. Do the data 
support this idea? 


21.48 Adolescent obesity. Adolescent obesity is a serious 
42 health risk affecting more than 5 million young people 


in the United States alone. Laparoscopic adjustable 
gastric banding has the potential to provide a safe and 
effective treatment. Fifty adolescents between 14 and 
18 years old with a body mass index higher than 35 
were recruited from the Melbourne, Australia, com- 
munity for the study.?’ Twenty-five were randomly 
selected to undergo gastric banding, and the remain- 
ing twenty-five were assigned to a supervised lifestyle 
intervention program involving diet, exercise, and 
behavior modification. All subjects were followed for 
two years. Here are the weight losses in kilograms for 
the subjects who completed the study. In the gastric 
banding group: £jj))) ADOBESE 


35.6 81.4 57.6 32.8 31.0 37.6 36.5—5.4 27.9 49.0 64.8 39.0 
ASO) SSS) IRE POY IS AIG Sisk, bis ike) MOA) S25) 20 


In the lifestyle intervention group: 


HO) 740) SO) ACs LIL) UGS) INQ) 4.0 
Haley Moe! SCC) M0) SJL 4s) Alloy) —IN4e) 1s) 


Is there good evidence that gastric banding is supe- 
rior to the lifestyle intervention program? 


21.49 Active versus traditional learning. Can active 


42 learning improve knowledge retention? Two under- 


graduate calculus-based engineering _ statistics 
courses were taught in different academic quarters, 
with one employing active-learning methods and 
the other using traditional learning methods. The 
traditional class was taught lecture-style with rela- 
tively little in-class interaction between peers and 
with the instructor. The active-learning course 
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integrated four group projects into the curriculum, 
with in-class time devoted to group work on the 
projects and fewer homework assignments. To assess 
knowledge retention, two five-question versions of a 
test were created. They had similar but not identi- 
cal questions covering core statistics topics, worth a 
total of 18 possible points. All students in both sec- 
tions were randomly given one version of the test as 
part of their final exam. Then, eight months later, a 
volunteer subset of the original students were given 
the version that they had not taken previously. To 
encourage students to take the second version of 
the exam, a $10 gift card to the university bookstore 
was given to each participant. The change in the 
score from the first version to the second is used to 
measure a student’s long-term ability to retain the 
course material. £jj) actTvLRN 


Here are the changes in exam scores for the 15 
students in the active group:”* 


o> 7 8 @ 3 @ 2 5 il 
a2 £ 4 3 SF 


The changes in exam scores for the 23 students in 
the traditional group are: 


2 By 
© i fi 


2 ty 3h Al 
Wi sg 


¢ © & 

5 oO 

5) 2 
Is there good evidence that active learning is supe- 
rior to traditional lecturing? 


Each day I am getting better in math. A “sublimi- 
nal” message is below our threshold of awareness but 
may nonetheless influence us. Can subliminal mes- 
sages help students learn math? A group of students 
who had failed the mathematics part of the City Uni- 
versity of New York Skills Assessment Test agreed to 
participate in a study to find out. £j))) suBiim 


All received a daily subliminal message, flashed on 
a screen too rapidly to be consciously read. The treat- 
ment group of 10 students (chosen at random) was 
exposed to “Each day I am getting better in math.” 
The control group of eight students was exposed 
to a neutral message, “People are walking on the 
street.” All students participated in a summer pro- 
gram designed to raise their math skills, and all took 
the assessment test again at the end of the program. 
Table 21.3 gives data on the subjects’ scores before 
and after the program.” Is there good evidence that 
the treatment brought about a greater improvement 
in math scores than the neutral message? How large is 
the mean difference in gains between treatment and 
control? (Use 95% confidence.) 


TREATMENT GROUP CONTROL GROUP 
BEFORE AFTER BEFORE AFTER 
18 24 18 29 
18 D5) 24 78) 
Jl 33 20 24 
18 29 18 26 
18 33 24 38 
20 36 22 ail 
23) 34 15 22) 
3} 36 19 Sill 

21 34 
17 Dil 
21.51 Active versus traditional learning, continued. 


21.52 


(a) Use the data in Exercise 21.49 to give a 90% 
confidence interval for the difference in the mean 
change in score for students in the active and tradi- 
tional classes. 

(b) Give a 90% confidence interval for the mean 
change in score of students in the active-learning 
class. fijy,) ACTVLRN 


Tropical flowers. Different varieties of the tropical 
flower Heliconia are fertilized by different species of 
hummingbirds. Over time, the lengths of the flow- 
ers and the forms of the hummingbirds’ beaks have 
evolved to match each other. Here are data on the 
lengths in millimeters of two color varieties of the 
same species of flower on the island of Dominica:*° 


FLOWERS 


Art Wolfe/Getty Images 


Exploring the Web 515 


H. caribaea RED 
41.90 42.01 41.93 43.09 41.47 41.69 39.78 40.57 
39.63 42.18 40.66 37.87 39.16 37.40 38.20 38.07 


SS MOTT SOT 3819) 93823 38.80 Sree SS KOlll FEMALE STUDENTS 
13, 8) il sho) 5) 3} i  & 3 3 25 25 
H. caribaea YELLOW a a en a ae ee 
36518) 31202 36.52 36s 36.0399 35:45 3815 Biel suas ay eaG 63.6 8 24 7 
35.17 36.82 36.66 35.68 36.03 34.57 34.63 
4 5 35 4 2 «1 y > JF 82 CG 4 2 
Is there good evidence that the mean lengths 1 a G 365 5 25 ilo 5 4 98 8 il 
of the two varieties differ? Estimate the difference 7s) 2S = fe fe A me a 
between the population means. (Use 95% confi- 
dence.) 4 5 1 5 3 4 10 7 3 4 4 4° 4 
a an 
21.53 Student drinking. A professor asked her sopho- 
more students, “How many drinks do you typically MALE STUDENTS 
45 have per session? (A drink is defined as one 12 i 715 8 15 3 4 if 5S it 45 @ 4 io 
oz beer, one 4 oz glass of wine, or one | oz shot 
2 , 16 4 8 5 oO Ff F 3 5 OS i Ww 4 
of liquor.)” Some of the students didn’t drink. 
Table 21.4 gives the responses of the female and e28 Sf > 8 ee 1 8 8 
male students who did drink.*! It is likely that some 15s We 2 6 5 2io 7 © 5 & 
of the students exaggerated a bit. The sample is all 7 3 6 2 5 2 So SSO 10 10 4 
students in one large sophomore-level class. The 5 Te ; 
class is popular, so we are tentatively willing to oe 4 ee eee 
regard its members as an SRS of sophomore stu- © oa 9 


dents at this college. Do a complete analysis that 
reports on £jj,| DRINKS 


(a) the drinking behavior claimed by sophomore 
women. 

(b) the drinking behavior claimed by sophomore 
men. 

(c) a comparison of the behavior of women and 
men. 


Exploring the Web 


21.54 A two-sample t test example. Find an example of a two-sample t test on the 
web. The Journal of the American Medical Association (j ama .ama-assn.org), 
Science Magazine (www. sciencemag.org), the Canadian Medical Association 
Journal (www.cmaj.ca), the Journal of Statistics Education (www.amstat. 
org/publications/ se), or perhaps the Journal of Quantitative Analysis in 
Sports (www. bepress .com/jqas) are possible sources. To help locate an ar- 
ticle, look through the abstracts of articles. Once you find a suitable article, read 
it and then briefly describe the study (including why it is a two-sample study) and 
its conclusions. If P-values, means, standard deviations, t statistics, and the like are 
reported, be sure to include them in your summary. Also, be sure to give the refer- 
ence (either the web link or the journal, issue, year, title of the paper, authors, and 
page numbers). 
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Antibiotics after surgery. If your college has online access to the Archives 
of Otolaryngology—Head and Neck Surgery, read the article “Duration-Relat- 
ed Efficacy of Postoperative Antibiotics Following Pediatric Tonsillectomy: 
A Prospective, Randomized, Placebo-Controlled Trial” (available online 
at archotol .ama-assn.org/cgi/content/ful1/135/10/984). The 
authors appear to use two-sample t procedures in the paper. After reading the 
article, read the (brief) discussion on the Chance website, www. causeweb. 
org/wiki/chance/index.php/Chance_News_57. What are some criti- 
cisms or concerns expressed about the study in the Chance article? 


Cavan Images/Getty Images 


nference about a 
Population Proportion 


ur discussion of statistical inference to this point has concerned making infer- 

ences about population means. Interest in the mean response for a population, 

or a comparison of the means of two populations, is common when the response 
variable is quantitative and takes a numerical value with some unit of measurement. 
Now we turn to questions about the proportion of some outcome in a population. Even 
when the original response is a quantitative variable, such as total cholesterol, we might 
be more interested in whether or not someone has cholesterol greater than 200 mg/dL 
(outcome of interest). Our proportion of interest is then the population proportion of 
adults who have cholesterol greater than 200 mg/dL, and the methods of this chapter 
apply. Here are some further examples that call for inference about population proportions. 


EXAMI the 


How common is behavior that puts people at risk of AIDS? In the early 1990s, the 
landmark National AIDS Behavioral Surveys interviewed a random sample of 2673 
adult heterosexuals. Of these, 170 had more than one sexual partner in the past 
year. That’s 6.36% of the sample.' Based on these data, what can we say about the 
percent of all adult heterosexuals who have multiple partners? We want to estimate a 
single population proportion. This chapter concerns inference about one proportion. m 


CRAFTER 


In this chapter 
we cover... 


22.1 The sample proportion p 


22.2 Large-sample confidence 
intervals for a proportion 


22.3 Choosing the sample size 


22.4 Significance tests for a 
proportion 


22.5 Plus four confidence 
intervals for a proportion* 
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sample proportion 


A surprising number of young adults (aged 19 to 25) still live at home with their 
parents. A random sample of 2253 men and 2629 women in this age group found 
that 44% of the men but only 35% of the women lived at home. Is this significant 
evidence that the proportions living at home differ in the populations of all young 
men and all young women? We want to compare two population proportions. This is 
the topic of Chapter 23. m 


To make inferences about a population mean p, we use the mean xX of a random 
sample from the population. The reasoning of inference starts with the sampling 
distribution of x. Now we follow the same pattern, replacing means by proportions. 


We are interested in the unknown proportion p of a population that has some 
outcome. For convenience, call the outcome we are looking for a “success.” In 
Example 22.1, the population is adult heterosexuals, and the parameter p is the 
proportion who have had more than one sexual partner in the past year. To 
estimate p, the National AIDS Behavioral Surveys used random dialing of tele- 
phone numbers to contact a sample of 2673 people. Of these, 170 said they had 
had multiple sexual partners. The statistic that estimates the parameter p is the 
sample proportion 


a number of successes in the sample 


total number of individuals in the sample 


170 
= — = 0.0636 
2673 
Read the sample proportion p as “p-hat.” 
How good is the statistic f as an estimate of the parameter p? To find out, we 
ask, “What would happen if we took many samples?” The sampling distribution of 
p answers this question. Here are the facts.’ 


Sampling Distribution of a Sample Proportion 


Draw an SRS of size n from a large population that contains proportion p of successes. 
Let f be the sample proportion of successes, 


number of successes in the sample 


n 
Then: 


«© The mean of the sampling distribution is p. 


« The standard deviation of the sampling distribution is 


pl — p) 


n 


As the sample size increases, the sampling distribution of f becomes 
approximately Normal. That is, for large n, p has approximately the 


N(p, Vp(1 — p)/n) distribution. 
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of successes 


FIGURE 22.1 
Select a large SRS from a population in which the proportion p are successes. The sampling distribution 
of the proportion p of successes in the sample is approximately Normal. The mean is p and the standard 


deviation is Vp(1 — p)/n. 


Figure 22.1 summarizes these facts in a form that helps you recall the big idea 
of a sampling distribution. The behavior of sample proportions fp is similar to the 
behavior of sample means x, except that the distribution of p is only approximately 
Normal. The mean of the sampling distribution of f is the true value of the popula- 
tion proportion p. That is, p is an unbiased estimator of p. The standard deviation 
of fp gets smaller as the sample size n gets larger, so that estimation is likely to be 
more accurate when the sample is larger. As is the case for x, the standard deviation 
gets smaller only at the rate Vn. We need four times as many observations to cut 
the standard deviation in half. 


Suppose that in fact 6% of all adult heterosexuals had more than one sexual partner 
in the past year (and would admit it when asked). The National AIDS Behavioral 
Surveys interviewed a random sample of 2673 people from this population. In many 
such samples, the proportion p of the 2673 people in the sample who had more 
than one partner would vary according to (approximately) the Normal distribution 
with mean 0.06 and standard deviation 


nes hs = 
n 2673 
= V0.0000211 = 0,00459 = 


Apply Your Knowledge 


22.1 Staph Infections. A study investigated ways to prevent staph infections in 
surgery patients. In a first step, the researchers examined the nasal secretions 
of arandom sample of 6771 patients admitted to various hospitals for surgery. 
They found that 1251 of these patients tested positive for Staphylococcus 
aureus, a bacterium responsible for most staph infections? 


(a) Describe the population and explain in words what the parameter p is. 


(b) Give the numerical value of the statistic f that estimates p. 
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standard error of p 


FIGURE 22.2 

With probability C, lies within 
+z*Vp(1 — p)/n of the unknown 
population proportion p. That is, 

in these samples p lies within 


+z*V/p(1 — p)/n of p. 
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22.2 The 68-95-99.7 Rule and p. Although over 50% of American adults be- 
lieve the maxim that breakfast is the most important meal of the day, only 
about 30% eat breakfast daily. A cereal manufacturer contacts an SRS of 
1000 American adults. If the sample were repeated many times, what would 
be the range of sample proportions who eat breakfast daily according to the 


95 part of the 68-95-99.7 rule? 


22.3 Social Network Sites. About 90% of young adult Internet users (aged 18 
to 29) use social network sites.’ Suppose that a sample survey contacts an 
SRS of 1500 young adult Internet users and calculates the proportion pf in 
this sample who use social network sites. 


(a) What is the approximate distribution of f? 


(b) If the sample size were 6000 rather than 1500, what would be the 
approximate distribution of p? 


We can follow the same path from sampling distribution to confidence interval 
as we did for x in Chapter 15. To obtain a level C confidence interval for p, we start 
by capturing the central probability C in the distribution of f. To do this, go out 
z* standard deviations from the mean p, where z* is the critical value that captures 
the central area C under the standard Normal curve. Figure 22.2 shows the result. 
The confidence interval is 


P b( = b) 
nn a 


This won’t do, because we don’t know the value of p. So we replace the standard 
deviation by the standard error of p 


Sampling 
distribution 
of p 


Probability C 


il t i 


. p é 
poz 7 (unknown) pa 
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to get the confidence interval 


As with previous confidence intervals, this interval has the familiar form 
estimate + 2*SE sstimate 


We can trust this confidence interval only for large samples. Because the number 
of successes must be a whole number, using a continuous Normal distribution to 
describe the behavior of p may not be accurate unless n is large. Because the ap- 
proximation is least accurate for populations that are almost all successes or almost 
all failures, we require that the sample have both enough successes and enough fail- 
ures rather than that the overall sample size be large. Pay attention to both conditions 
for inference in the box below that summarizes the confidence interval: we must as usual 
be willing to regard the sample as an SRS from the population, and the sample 
must have both enough successes and enough failures. The condition on successes A 
and failures ensures that the sample size is large enough to use the Normal ap- 
proximation without knowing p. 


Large-Sample Confidence Interval 


for a Population Proportion 


Draw an SRS of size n from a large population that contains an unknown proportion 
p of successes. An approximate level C confidence interval for p is 
. aD 


n 


hae 


where x* is the critical value for the standard Normal density curve with area C 
between —z* and z*. 


Use this interval only when the numbers of successes and failures in the sample are 
both at least 15.° 


Why not t? Notice that we don’t change z* to t* when we replace the standard 
deviation by the standard error. When the sample mean x estimates the popula- 
tion mean p, a separate parameter o describes the variability of the distribution 
of x. We separately estimate a, and this leads to at distribution. When the sample 
proportion fp estimates the population proportion p, the variability depends on p, 
not on a separate parameter. There is no t distribution—we just make the Normal 
approximation a bit less accurate when we replace p in the standard deviation by p. 


The four-step process for any confidence interval is outlined on page 381. 42 


)STATE® The National AIDS Behavioral Surveys found that 170 of a sample of 2673 
adult heterosexuals had had multiple partners. That is, 


What can we say about the population of all adult heterosexuals? 
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PLAN: We will give a 99% confidence interval to estimate the proportion p of all 
adult heterosexuals who have multiple partners. 


ESRF First verify the conditions for inference: 


= The sampling design was a complex stratified sample, and the survey used infer- 
ence procedures for that design. The overall effect is close to an SRS, however. 

= The sample is large enough: the numbers of successes (170) and failures 
(2503) in the sample are both much larger than 15. 


The sample size condition is easily satisfied. The condition that the sample be an SRS 
is only approximately met. 

A 99% confidence interval for the proportion p of all adult heterosexuals with mul- 
tiple partners uses the standard Normal critical value z* = 2.576. The confidence 
interval is 


p(1 — p 0.0636)(0.9364 
PU =P) P) = 0.0636 + 25764] x ) 
n 2673 


= 0.0636 + 0.0122 
= 0.0514 to 0.0758 
(ESR We are 99% confident that the percent of adult heterosexuals who 


have had more than one sexual partner in the year prior to the survey lies between 
about 5.1% and 7.6%. m 


BF wos 


A SMOKER? 


When estimating a proportion p, 

be sure you know what counts as a 
“success.” The news says that 20% 
of adolescents smoke. Shocking. It 
turns out that this is the percent 
who smoked at least once in the past 
month. If we say that a smoker is 
someone who smoked on at least 20 
of the past 30 days and smoked at 
least half a pack on those days, fewer 
than 4% of adolescents qualify. 


As usual, the practical problems of a large sample survey weaken our con- 
fidence in the AIDS survey’s conclusions. Only people in households with 
landline telephones could be reached. Although at the time of the survey 

about 89% of American households had landline telephones, as the number of 
cell phone only users increases, using a sample of households with landline phones 
is becoming less acceptable for surveys of the general population (see page 216). 
Additionally, some groups at high risk for AIDS, such as people who inject illegal 
drugs, often don’t live in settled households and were therefore underrepresented 
in the sample. About 30% of the people reached refused to cooperate. A nonre- 
sponse rate of 30% is not unusual in large sample surveys, but it may cause some 
bias if those who refuse differ systematically from those who cooperate. The 
survey used statistical methods that adjust for unequal response rates in different 
groups. Finally, some respondents may not have told the truth when asked about 
their sexual behavior. The survey team tried to make respondents feel comfort 
able. For example, Hispanic women were interviewed only by Hispanic women, 
and Spanish speakers were interviewed by Spanish speakers with the same re- 
gional accent (Cuban, Mexican, or Puerto Rican). Nonetheless, the survey report 
says that some bias is probably present: 


It is more likely that the present figures are underestimates; some respondents 
may underreport their numbers of sexual partners and intravenous drug use 
because of embarrassment and fear of reprisal, or they may forget or not know 
details of their own or of their partner’s HIV risk and their antibody testing 
history.’ 


Reading the report of a large study like the National AIDS Behavioral Surveys 
reminds us that statistics in practice involves much more than formulas for 
inference. 
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() LaunchPad Online Resources 


* The Snapshots video, Inference for One Proportion, provides the details of 
constructing a large-sample confidence interval for a proportion through an 
interesting example involving an opinion poll. 


* The StatClips Examples video, Confidence Intervals: Intervals for Proportions 
Example C, illustrates the computation of a large-sample confidence interval for 
a proportion. 


Apply Your Knowledge 


22.4 No Confidence Interval. In the National AIDS Behavioral Surveys sam- 
ple of 2673 adult heterosexuals, 0.2% (that’s 0.002 as a decimal fraction) 
had both received a blood transfusion and had a sexual partner from a group 
at high risk of AIDS. Explain why we can’t use the large-sample confidence 
interval to estimate the proportion p in the population who share these two 
risk factors. 


22.5 Canadian Attitudes toward Guns. Canada has much stronger gun con- 
trol laws than the United States, and Canadians support gun control more 
strongly than do Americans. A sample survey asked a random sample of 1505 
adult Canadians, “Do you agree or disagree that all firearms should be reg- 
istered?” Of the 1505 people in the sample, 1288 answered either “Agree 
strongly” or “Agree somewhat.”® 


(a) The survey dialed residential telephone numbers at random in all 10 
Canadian provinces (omitting the sparsely populated northern territo- 
ries). Based on what you know about sample surveys, what is likely to 
be the biggest weakness in this survey? 


(b) Nonetheless, act as if we have an SRS from adults in the Canadian 
provinces. Give a 95% confidence interval for the proportion who 
support registration of all firearms. 


22.6 Weightlifting Injuries. Resistance training is a popular form of condition- 
ing aimed at enhancing sports performance and is widely used among high 42 
school, college, and professional athletes, although its use for younger ath- 
letes is controversial. A random sample of 4111 patients aged 8-30 admit- 
ted to U.S. emergency rooms with the Consumer Product Safety Commis- 
sion code “weightlifting” were obtained. These injuries were classified as 
“accidental” if caused by dropped weight or improper equipment use. Of 
the 4111 weightlifting injuries, 1552 were classified as accidental.’ Give a 
90% confidence interval for the proportion of weightlifting injuries in this 
age group that were accidental. Follow the four-step process as illustrated 
in Example 22.4. 


In planning a study, we may want to choose a sample size that will allow us to 
estimate the parameter within a given margin of error. We saw earlier (page 425) 
how to do this for a population mean. The method is similar for estimating a popu- 
lation proportion. 

The margin of error in the large-sample confidence interval for p is 
, /PG = b) 


n 
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wistiS NEW YORK Here z* is the standard Normal critical value for the level of confidence we want. 
tt NEW YORK Because the margin of error involves the sample proportion of successes p, we need 


New VorleC ei they sacs to guess this value when choosing n. Call our guess p*. Here are two ways to get p*: 


bigger, richer, faster, and ruder. 1. Use a guess p* based on a pilot study or on past experience with similar studies. 


Maybe there's something to that. You can do several calculations to cover the range of values of f you might get. 
The sample survey firm Zogby 


International says that as a national 2. Use p* = 0.5 as the guess. The margin of error m is largest when p* = 0.5, so 

average it takes five telephone calls this guess is conservative in the sense that if we get any other p when we do 

to reach a live person. When calling our study, we will get a margin of error smaller than planned. 

to New York, it takes 12 calls. Survey 

firms assign their best interviewers Once you have a guess p*, the recipe for the margin of error can be solved to give 
y g p g g 

to make calls to New York and often the sample size n needed. Here is the result for the large-sample confidence interval. 


pay them bonuses to cope with the 


For simplicity, use this result even if you plan to use the plus four interval discussed 
stress. 


in Section 22.5. 


Sample Size for Desired Margin of Error 


The level C confidence interval for a population proportion p will have margin of er- 
ror approximately equal to a specified value m when the sample size is 


n=(])pra—p 


where p* is a guessed value for the sample proportion. The margin of error will always 
be less than or equal to m if you take the guess p* to be 0.5. 


Which method for finding the guess p* should you use? The n you get doesn’t 
change much when you change p* as long as p* is not too far from 0.5. You can 
use the conservative guess p* = 0.5 if you expect the true p to be roughly between 
0.3 and 0.7. If the true f is close to 0 or 1, using p* = 0.5 as your guess will give a 
sample much larger than you need. Try to use a better guess from a pilot study when 
you suspect that p will be less than 0.3 or greater than 0.7. 


large city. You are planning a sample survey to determine what percent of the 
voters intend to vote for Chavez. You will contact an SRS of registered voters 
in the city. You want to estimate the proportion p of Chavez voters with 95% 
confidence and a margin of error no greater than 3%, or 0.03. How large a 
sample do you need? 


42 MSTATE? Gloria Chavez and Ronald Flynn are the candidates for mayor in a 


VOTE PLAN: Find the sample size n needed for margin of error m = 0.03 and 95% con- 
fidence. The winner's share in all but the most lopsided elections is between 30% 
CHAVEZ and 70% of the vote. You can use the guess p* = 0.5. 


ESQRVER The sample size you need is 


1.96 \? 
= | —— ] (0.5)(1 — 0.5) = 1067.1 
n Gar i( ) 


i~ Round the result up to n = 1068. (Rounding down would give a margin of error 
Colin Anderson/BrandX/Age fotostock slightly greater than 0.03.) 


QSOS An SRS of 1068 registered voters is adequate for margin of error 
+3%. mI 
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If you want a 2.5% margin of error rather than 3%, then (after rounding up) 


1.96 °Y 


For a 2% margin of error, the sample size you need is 


n= (=) (0.5)(1 — 0.5) = 2401 


As usual, smaller margins of error call for larger samples. 


() LaunchPad Online Resources 


* The StatBoards video, Computing a Sample Size for One Proportion, provides 
an additional example of choosing the sample size to provide a given margin 
of error. 


Apply Your Knowledge 


22.7 Did You Use a Mobile Device? The Monterey Bay Aquarium, founded in 
1984, is situated on the beautiful coast of Monterey Bay in the historic Can- 
nery Row district. From 2009 to 2013, the aquarium conducted a survey of 
a random sample of visitors as they exited the museum. The survey includes 
visitor demographic information, use of social media, and opinions on their 
aquarium visit. In 2013, the survey included 165 visitors over 65, of which 42 
used a mobile device such as an Android phone or iPad during their visit.'® 
(a) What is the margin of error of the large-sample 95% confidence 
interval for the proportion of visitors over 65 who used a mobile device 
during their visit? 

(b) How large a sample is needed to get the common +3 percentage point 
margin of error? Use the sample from part (a) as a pilot study to get p*. 


22.8 Can You Taste PTC? PTCis a substance that has a strong bitter taste for some 
people and is tasteless for others. The ability to taste PTC is inherited. About 

75% of Italians can taste PTC, for example. You want to estimate the proportion 

of Americans with at least one Italian grandparent who can taste PTC. 

(a) Starting with the 75% estimate for Italians, how large a sample must 
you collect in order to estimate the proportion of PTC tasters within 
+0.04 with 90% confidence? 

(b) Estimate the sample size required if you made no assumptions about 
the value of the proportion who could taste PTC. How much has the 
required sample size changed? 


The test statistic for the null hypothesis Ho: p = po is the sample proportion p 
standardized using the value po specified by Ho, 


D — po 
po(1 — po) 


n 


z= 


This z statistic has approximately the standard Normal distribution when Hp is 
true. P-values therefore come from the standard Normal distribution. Unlike the 
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confidence interval in which p is unknown and must be estimated by f when stan- 
dardizing the estimate, in the test we can replace p by po when standardizing as po 
is specified by Hp. Additionally, because Hp fixes a value of p when standardizing 
the estimate, the sample size conditions for use of the test are less stringent than 
for the large-sample confidence interval in which p must be estimated. Here is the 
procedure for tests. 


Significance Tests for a Proportion 


Draw an SRS of size n from a large population that contains an unknown proportion 
p of successes. To test the hypothesis Hp: p = po, compute the z statistic 


P — do 
pol = Po) 


n 


z= 


In terms of a variable Z having the standard Normal distribution, the approximate 
P-value for a test of Hy against 


bap > py is RBZx 


> eo 
Hp > po is P(Zz) eo 
- a 


Hyp # po is 2P(Z=|z|) 


Use this test when the sample size n is so large that both npp and n(1 — po) are 10 or 
more.'! 


The four-step process for any significance test is outlined on page 401. 


(STATE!) We hear that newborn babies are more likely to be boys than girls, pre- 
sumably to compensate for higher mortality among boys in early life. Is this true? A 
random sample found 13,173 boys among 25,468 firstborn children.'* The sample 
proportion of boys was 


13,173 
25,468 


p= = 0.5172 

Boys do make up more than half of the sample, but of course we don’t expect a 
Gr SMEs... s perfect 50—50 split in a random sample. Is this sample evidence that boys are more 
© Bibine Hamann MC hts common than girls in the entire population? 
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“PLAN: Take p to be the proportion of boys among all firstborn children of American 
mothers. (Biology says that this should be the same as the proportion among all 
children, but the survey data concern first births.) 

We want to test the hypotheses 


Ho: p = 0.5 

Hep > 05 
EXSRVEA The conditions for inference require that we have a random sample and 
that npo = (25,468)(0.5) = 12,734 and n(1 — po) = (25,468)(0.5) = 12,734 


are both greater than 10. Since the conditions for inference are met, we can go on 
to find the z test statistic: 


A 


P — Po 
Po( 1 — Po) 

n 
0.5172 -0.5 _ 
(0.5)(0.5) 
25,468 


The P-value is the area under the standard Normal curve to the right of z = 5.49. We 
know that this is very small; Table C shows that P < 0.0005. Minitab (Figure 22.3) 
says that P is O to three decimal places. 


(SSMU There is very strong evidence that more than half of firstborns are 
boys (P < 0.001). = 


Session 


Test and Cl for One Proportion 


Test of p = 0.5 p > 0.5 


95% lower FIGURE 22.3 
Sample x N bound Z-Value P-Value Minitab output for the significance test 
Ae 13173 25468 0.512087 5.50! 0.000 of Example 22.6. Roundoff error in 


‘ F : Example 22.6 explains the small 
Using the normal approximations. : : 
difference (5.49 versus 5.50) in the 
values of the z statistic. 


With 13,173 successes in 25,468 trials, we have at least 15 successes and 15 failures 


in the sample. The conditions for the large-sample confidence interval are easily met. 
The 99% confidence interval is 


. p(1 —p 0.5172)(0.4828 
+ z* pil =e) P) = 0.5172 + 25764)! ut ) 
n 25,468 


= 0.5172 + 0.0081 
= 0.5091 to 0.5253 
We are 99% confident that between about 51% and 52.5% of first children are boys. 


The confidence interval is more informative than the test in Example 22.6, which 
tells us only that more than half are boys. = 
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() LaunchPad Online Resources 


* The StatBoards video, Hypothesis Test for One Proportion, illustrates the 
computation of a large-sample test for a proportion through an example. 


Apply Your Knowledge 


22.9 Spinning Euros. All euros have a national image on the “heads” side and a 
4. common design on the “tails” side. Spinning a coin, unlike tossing it, may not 
give heads and tails equal probabilities. Polish students spun the Belgian euro 
250 times, with its portly king, Albert, displayed on the heads side. The result 
was 140 heads. How significant is this evidence against equal probabilities? 
Follow the four-step process as illustrated in Example 22.6. 


22.10 Vote for the Best Face? We often judge other people by their faces. It 

42 appears that some people judge candidates for elected office by their faces. 

Psychologists showed head-and-shoulders photos of the two main candidates 

in 32 races for the U.S. Senate to many subjects (dropping subjects who rec- 

ognized one of the candidates) to see which candidate was rated “more com- 

petent” based on nothing but the photos. On election day, the candidates 

whose faces looked more competent won 22 of the 32 contests. If faces 

don’t influence voting, half of all races in the long run should be won by the 

candidate with the better face. Is there evidence that the candidate with the 

better face wins more than half the time? Follow the four-step process as il- 
lustrated in Example 22.6. 


22.11 No Test. Explain whether we can use the z test for a proportion in these 
situations. 

(a) You toss a coin 10 times in order to test the hypothesis Ho: p = 0.5 
that the coin is balanced. 

(b) A local candidate contacts an SRS of 900 of the registered voters in 
his district to see if there is evidence that more than half support the 
bill he is sponsoring. 

(c) Accollege president says, “99% of the alumni support my firing of Coach 
Boggs.” You contact an SRS of 200 of the college’s 15,000 living 
alumni to test the hypothesis Ho: p = 0.99. 


The large-sample confidence interval f + z*V (1 — p)/n for a sample pro- 
portion p is easy to calculate. It is also easy to understand because it rests directly on 
the approximately Normal distribution of f . Unfortunately, confidence levels from 
this interval can be inaccurate, particularly with smaller sample sizes. The actual 
confidence level is usually less than the confidence level you asked for in choosing 
the critical value z*. That’s bad. What is worse, accuracy does not consistently get 
better as the sample size n increases. There are “lucky” and “unlucky” combinations 
of the sample size n and the true population proportion p. 

Fortunately, there is a simple modification that is almost magically effective in 
improving the accuracy of the confidence interval. We call it the “plus four” method 


*This section is optional. 
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because all you need to do is add four imaginary observations: two successes and two 
failures. With the added observations, the plus four estimate of p is plus four estimate 


number of successes in the sample + 2 
n+ 4 


The formula for the confidence interval is exactly as before, with the new sample 
size and number of successes.” You do not need software that offers the plus four 
interval—just enter the new sample size (actual size + 4) and number of successes 
(actual number + 2) into the large-sample procedure. 


Plus Four Confidence Interval for a Proportion 


Draw an SRS of size n from a large population that contains an unknown proportion p of 
successes. To get the plus four confidence interval for p, add four imaginary observa- 
tions—two successes and two failures. Then use the large-sample confidence interval 
with the new sample size (n + 4) and number of successes (actual number + 2). 


Use this interval when the confidence level is at least 90% and the sample size n is at 
least 10, with any counts of successes and failures. 


STATE! Cocaine users commonly snort the powder up the nose through a rolled-up 
paper currency bill. Spain has a high rate of cocaine use, so it’s not surprising that 
euro paper currency in Spain often bears traces of cocaine. Researchers collected 20 
euro bills in each of several Spanish cities. ln Madrid, 17 out of 20 bore traces of 
cocaine.'® The researchers note that we can’t tell whether the bills had been used to 
snort cocaine or had been contaminated in currency-sorting machines. Estimate the 
proportion of all euro bills in Madrid that have traces of cocaine. 


_PLAN: Take p to be the proportion of bills that show cocaine traces. That is, a “suc- 
cess” is a bill that shows cocaine traces. Give a 95% confidence interval for p. 


ESO RV | is not clear how the bills in the sample were selected, so we don't know if 
we have an SRS. We will act as though we have an SRS, but we proceed with caution. 
The conditions for use of the large-sample interval are not met because there are only 
three failures. To apply the plus four method, add two successes and two failures to 
the original data. The plus four estimate of p is 


_ 174+2 ~=~°19 
P~30+4. 24 


= 0.7917 


We calculate the plus four confidence interval in the same way as we do the large- 


sample interval, but we base it on 19 successes in 24 observations. sl - 
Here it is: © Andrea Matone/Alamy 
p(1 — p 0.7917)(0.2083 
p= [PCP o7o17 = 1.960,/{7217102089) 
n+4 24 


= 0.7917 + 0.1625 
= 0.6292 to 0.9542 
| CONCLUDE: (AGS the sample can be regarded as an SRS, we estimate with 


95% confidence that between about 63% and 95% of all euro bills in Madrid bear 
traces of cocaine. = 
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For comparison, the ordinary sample proportion is 


~  L7 

p= 7 0.85 
The plus four estimate p = 0.7917 in Example 22.8 is farther away from 1 than 
p = 0.85. The plus four estimate gains its added accuracy by always moving toward 
0.5 and away from 1 or 0, whichever is closer. This is particularly helpful when the 
sample contains only a few successes or a few failures. The numerical difference 
between a large-sample interval and the corresponding plus four interval is often 
small. Remember that the confidence level is the probability that the interval will 
catch the true population proportion in very many uses. Small differences every 
time can add up to more accurate confidence levels from plus four versus the large- 
sample interval. 

How much more accurate is the plus four interval? Computer studies have asked 
how large n must be to guarantee that the actual probability that a 95% confidence 
interval covers the true parameter value is at least 0.94 for all samples of size n or 
larger. If p = 0.1, for example, the answer is n = 646 for the large-sample inter- 
val and n = 11 for the plus four interval." The consensus of computational and 
theoretical studies is that plus four is better than the large-sample interval for many 
combinations of n and p. 


Apply Your Knowledge 


Jennifer Shields/Getty Images 


22.12 Black Raspberries and Cancer. Sample surveys usually contact large 
samples, so we can use the large-sample confidence interval if the sample 
design is close to an SRS. Scientific studies often use smaller samples that 
require the plus four method. For example, Familial Adenomatous Polyposis 
(FAP) is a rare inherited disease characterized by the development of an 
extreme number of polyps early in life and colon cancer in virtually 100% of 
patients before the age of 40. A group of 14 people suffering from FAP being 
treated at the Cleveland Clinic drank black raspberry powder in a slurry of 
water every day for nine months. The numbers of polyps were reduced in 11 
out of 14 of these patients.'® 


(a) Why can't we use the large-sample confidence interval for the propor- 
tion p of patients suffering from FAP that will have the number of 
polyps reduced after nine months of treatment? 


(b) The plus four method adds four observations—two successes and two 
failures. What are the sample size and the number of successes after 
you do this? What is the plus four estimate p of p? 


(c) Give the plus four 90% confidence interval for the proportion of 
patients suffering from FAP who will have the number of polyps 
reduced after nine months of treatment. 


22.13 Computer/Internet-Based Crime. With over 50% of adults spending 
more than an hour a day on the Internet, the number experiencing computer 
or Internet-based crime continues to rise. A survey in 2010 of a random sample 
of 1025 adults, aged 18 and older, reached by random digit dialing found 113 
adults in the sample who said that they or a household member was a victim of 
a computer or Internet crime on their home computer in the past year.!? 


(a) Give the 95% large-sample confidence interval for the proportion p of 
all households that have experienced computer or Internet crime dur- 
ing the year before the survey was conducted. Be sure to verify that the 
sample size is large enough to use the large-sample confidence interval. 


(b) Give the plus four 95% confidence interval for p. If you express the two 
intervals in percents, rounded to the nearest tenth of a percent, how 
do they differ? (The plus four interval always pulls the results away 
from 0% or 100%, whichever is closer.) 


22.14 Cocaine Traces in Spanish Currency, Continued. The plus four 
method is particularly useful when there are no successes or no failures in the 
data. The study of Spanish currency described in Example 22.8 found that in 
Seville, all 20 of a sample of 20 euro bills had cocaine traces. 


(a) What is the sample proportion f of contaminated bills? What is the 
large-sample 95% confidence interval for p? It’s not plausible that every 
bill in Seville has cocaine traces, as this interval says. 

(b) Find the plus four estimate p and the plus four 95% confidence 
interval for p. These results are more reasonable in this situation. 


CHAPTER 22 SUMMARY 


Chapter Specifics 


Tests and confidence intervals for a population proportion p when the data are an SRS 
of size n are based on the sample proportion p. 


When n is large, f has approximately the Normal distribution with mean p and standard 
deviation V p(1 — p)/n. 
The level C large-sample confidence interval for p is 


: p(1 — p) 


where z* is the critical value for the standard Normal curve with area C between —z* 
and x*. Use this interval only when both the number of successes and the number of 
failures in the sample are at least 15. 


The sample size needed to obtain a confidence interval with approximate margin of 
error m for a population proportion is 


n=(©)pe =p" 


where p* is a guessed value for the sample proportion f, and z* is the standard Normal 
critical point for the level of confidence you want. If you use p* = 0.5 in this formula, 
the margin of error of the interval will be less than or equal to m no matter what the 
value of f is. 


Significance tests for Ho: p = po are based on the statistic 
pb — Po 
Po(1 — po) 


n 


with P-values calculated from the standard Normal distribution. Use this test in prac- 
tice when npp = 10 and n(1 — po) = 10. 


(Optional topic) To get a more accurate confidence interval for smaller sample sizes, 
add four imaginary observations, two successes and two failures, to your sample. Then 
use the same formula for the confidence interval. This is the plus four confidence 
interval. Use this interval in practice for confidence level 90% or higher and sample 
size n at least 10. 


Summary 
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KIDS ON BIKES 

In the most recent year 
for which data are available, 77% of 
children killed in bicycle accidents 
were boys. You might take these data 
as a sample and start from p = 0.77 
to do inference about bicycle deaths 
in the near future. What you should 
not do is conclude that boys on bikes 
are in greater danger than girls. We 
don’t know how many boys and 
girls ride bikes—it may be that most 
fatalities are boys because most riders 
are boys. 


Inference about a Population Proportion 


Link It 


The methods of this chapter can be used to compute confidence intervals and test hy- 
potheses about a population proportion. This may be the proportion in a population with 
some attribute of interest such as the population proportion of young adults who use social 
networking sites. The proportion could also correspond to the probability of an outcome 
in an experiment such as the probability that a spinning coin will land on heads. Since the 
methods of this chapter are approximations, it is important always to check the conditions 
required for the approximations to work well, whether you are using the large sample con- 
fidence interval, the z test, or the plus four method. 

When making inferences about a proportion in a population, an important assumption 
for the methods of this chapter is that the data are an SRS from the population. This is 
the most difficult assumption to guarantee, because of the many difficulties associated with 
obtaining an SRS. These difficulties were described in Chapter 8. With nonresponse rates 
in many surveys that are over 80%, the final sample may not be representative of the popu- 
lation even when experimenters initially selected an SRS. Because of this, most surveys 
of large populations need to use more complicated sampling schemes as well as to make 
modifications in the estimates to adjust for problems such as nonresponse. 

In Chapters 20 and 21, we first considered inferences about a single mean, and then 
turned our attention to situations that required the comparison of two population means. 
In many instances, we want to compare two population proportions rather than making 
inferences about a single proportion. Methods to compare two population proportions will 
be described in the next chapter. 


(4) LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


° StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 
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22.15 


22.16 


The proportion of drivers who use seat belts de- 
pends on characteristics such as age, sex, and eth- 
nicity. As part of a broader study, investigators 
observed a random sample of 117 female Hispanic 
drivers in Boston. Suppose that in fact 60% of all fe- 
male Hispanic drivers in the Boston area wear seat 
belts. In repeated samples, the sample proportion p 
would follow approximately a Normal distribution 
with mean 


(a) 70.2. (b) 0.6. (c) 0.002. 
The standard deviation of the distribution of p in 
Exercise 22.15 is about 


(a) 0.002. (b) 0.045. (c) 0.24. 


Use the following information for Exercises 22.17 through 22.19. 
A 2011 NBC News survey found that 80% of a sample of 4500 
American teens said they owned an MP3 player such as an iPod. 
Assume that the sample was an SRS. 
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22.18 


22.19 


22.20 


Based on the sample, the large-sample 90% confi- 
dence interval for the proportion of all American 
teens who own an MP3 player is 


(a) 0.80 + 0.0060. (b) 0.80 + 0.0098. 
(c) 0.80 + 0.0117. 


In Exercise 22.17, suppose we computed a large- 
sample 80% confidence interval for the proportion 
of all American teens who own an MP3 player. This 
80% confidence interval 


(a) would have a smaller margin of error than the 
90% confidence interval. 

(b) would have a larger margin of error than the 
90% confidence interval. 

(c) could have either a smaller or a larger margin of 
error than the 90% confidence interval. This varies 
from sample to sample. 


How many American teens must be interviewed to 
estimate the proportion who own an MP3 player 
within +0.02 with 99% confidence using the large- 
sample confidence interval? Use 0.5 as the conserva- 
tive guess for p. 


(a) n = 1692 (b) n = 2401 (c) n = 4148 
An opinion poll asks an SRS of 100 college seniors 


how they view their job prospects. In all, 53 say 
“Good.” The large-sample 95% confidence interval 
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for estimating the proportion of all college seniors 
who think their job prospects are good is 


(a) 0.530 + 0.082. (b) 0.530 + 0.098. 
(c) 0.530 + 0.049. 


The sample survey in Exercise 22.20 actually called 
130 seniors, but 30 of the seniors refused to answer. 
This nonresponse could cause the survey result to be 
in error. The error due to nonresponse 


(a) is in addition to the margin of error found in 
Exercise 22.20. 

(b) is included in the margin of error found in 
Exercise 22.20. 

(c) can be ignored because it isn’t random. 


Experiments on learning in animals sometimes measure 
how long it takes mice to find their way through a maze. 
Only half of all mice complete one particular maze in 
less than 18 seconds. A researcher thinks that a loud 
noise will cause the mice to complete the maze faster. 
She measures the proportion of 40 mice that completed 
the maze in less than 18 seconds with noise as a stimu- 
lus. The proportion of mice that completed the maze in 
less than 18 seconds is | = 0.7. The hypotheses for a 
test to answer the researcher’s question are 


(a) Ho: p = 0.5, Hy: p > 0.5. 
(b) Ho: p = 0.5, Hy: p < 0.5. 
(c) Ho: p = 0.5, Hz: p ¥ 0.5. 


The value of the z statistic for Exercise 22.22 is 2.53. 
This test is 


(a) not significant at either a = 0.05 ora = 0.01. 
(b) significant at a = 0.05 but not at a = 0.01. 
(c) significant at both a = 0.05 and a = 0.01. 


A Gallup Poll in November 2012 found that 54% 
of the people in the sample said they wanted to lose 
weight. The poll’s margin of error for 95% confi- 
dence was 4%. This means that 


(a) the poll used a method that gets an answer 
within 4% of the truth about the population 95% 
of the time. 

(b) we can be sure that the percent of all adults 
who want to lose weight is between 50% and 58%. 
(c) if Gallup takes another poll using the same 
method, the results of the second poll will lie 
between 50% and 58%. 
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22.25 Dosmokers know that smoking is bad for them? The 


Harris Poll asked a sample of smokers, “Do you believe 


that smoking will probably shorten your life, or not?” Of 


the 1010 people in the sample, 848 said “yes.” 


(a) Harris called residential telephone numbers at ran- 
dom in an attempt to contact an SRS of smokers. Based 
on what you know about national sample surveys, what 
is likely to be the biggest weakness in the survey? 
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(b) We will nonetheless act as if the people inter- 
viewed are an SRS of smokers. Give a 95% confi- 
dence interval for the percent of smokers who agree 
that smoking will probably shorten their lives. 


alessandro0770/Getty Images 


Reporting cheating. Students are reluctant to 
report cheating by other students. A student project 
put this question to an SRS of 172 undergraduates at 
a large university: “You witness two students cheat- 
ing on a quiz. Do you go to the professor?” Only 19 
answered “yes.””° Give a 95% confidence interval for 
the proportion of all undergraduates at this univer- 
sity who would report cheating. 


Harris announces a margin of error. Exercise 22.25 
describes a Harris Poll survey of smokers in which 
848 of a sample of 1010 smokers agreed that smoking 
would probably shorten their lives. Harris announces 
a margin of error of +3 percentage points for all 
samples of about this size. Opinion polls announce 
the margin of error for 95% confidence. 


(a) What is the actual margin of error (in percent) 
for the large-sample confidence interval from this 
sample? 

(b) The margin of error is largest when fp = 0.5. 
What would the margin of error (in percent) be if 
the sample had resulted in f = 0.5? 

(c) Why do you think that Harris announces a 
+3% margin of error for all samples of about this 
size? 


Sampling voters. Your state representative is plan- 
ning on supporting a bill legalizing same-sex marriage 
in your state and wants to estimate the proportion of 
voters in her district who support this bill. She mails 
a survey to an SRS of 1300 registered voters in her 
district, and 800 surveys are returned. In the sample 
of 800 surveys returned to your state representative, 
528 are from female voters. 


(a) In her district, voter registration records show 
that 54% of the registered voters are female. If 

the 800 returned surveys are an SRS of registered 
voters, do you feel that the difference between the 
proportion of female voters that responded to the 
survey and the population proportion of 54% can 
be easily explained by chance variation? State the 
hypotheses and give the P-value. Give your conclu- 
sion in the context of the problem. 
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(b) Among the 800 surveys that are returned, 72% 
support the bill. If women are more likely to support 
the bill legalizing same-sex marriage than men, 
what would be the likely direction of the bias in the 
estimate of the proportion of registered voters who 
support the bill? Explain your answer. 


Internet searches and cell phones. Pew Research 
Center’s Internet and American Life Project asked 
a random sample of 2485 cell phone users whether 
they had used their cell phone to look up health or 
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medical information. Of these, 422 said “yes.”” 


(a) Pew dialed cell phone telephone numbers at ran- 
dom in the continental United States in an attempt 
to contact a random sample of adults. Based on 
what you know about national sample surveys, what 
is likely to be the biggest weakness in the survey? 

(b) Act as if the sample is an SRS. Give a large- 
sample 90% confidence interval for the proportion 
p of all cell phone users who have used their cell 
phone to look up health or medical information. 

(c) Three out of the five most popular health 
related searches on cell phones have to do with 

sex: “pregnancy,” “herpes,” and “STD” (sexually 
transmitted diseases). Sex-related queries don’t 
even show up on Google and Yahoo’s list of top 

five health searches on computers. What do you 
think explains the difference in the topics for health 
related searches on cell phones versus computers? 
When drawing conclusions from a sample, you must 
always be careful about the relevant population. 


Gastric bypass surgery. How effective is gastric bypass 
surgery in maintaining weight loss in extremely obese 
people? A Utah-based study conducted between 2000 
and 2011 found that 76% of 418 subjects who had 
received gastric bypass surgery maintained at least a 
20% weight loss six years after surgery.” 


(a) Are the conditions for the use of the large- 
sample confidence interval met? Explain. 

(b) Give a 90% confidence interval for the propor- 
tion of those receiving gastric bypass surgery that 
maintained at least a 20% weight loss six years after 
surgery. 

(c) Interpret your interval in the context of the 
problem. 


Running red lights. A random digit dialing tele- 
phone survey of 880 drivers asked, “Recalling the 
last ten traffic lights you drove through, how many 
of them were red when you entered the intersections?” 
Of the 880 respondents, 171 admitted that at least 
one light had been red.” 


(a) Give a 95% confidence interval for the propor- 
tion of all drivers who ran one or more of the last 
10 red lights they met. 
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(b) Nonresponse is a practical problem for this 
survey—only 21.6% of calls that reached a live 
person were completed. Another practical problem 
is that people may not give truthful answers. What 
is the likely direction of the bias: do you think more 
or fewer than 171 of the 880 respondents really ran 
ared light? Why? 


The IRS plans an SRS. The Internal Revenue Ser- 
vice plans to examine an SRS of individual federal 
income tax returns from each state. One variable of 
interest is the proportion of returns claiming item- 
ized deductions. The total number of tax returns in a 
state varies from more than 15 million in California 
to fewer than 250,000 in Wyoming. 


(a) Will the margin of error for estimating the 
population proportion change from state to state if 
an SRS of 2000 tax returns is selected in each state? 
Explain your answer. 

(b) Will the margin of error change from state to 
state ifan SRS of 1% of all tax returns is selected in 
each state? Explain your answer. 


Surveying students. You are planning a survey of 
students at a large university to determine what 
proportion favor an increase in student fees to sup- 
port an expansion of the student newspaper. Using 
records provided by the registrar, you can select a 
random sample of students. You will ask each stu- 
dent in the sample whether he or she is in favor of 
the proposed increase. Your budget will allow a sam- 
ple of 100 students. 


(a) For a sample of size 100, construct a table of the 
margins of error for 95% confidence intervals when 
p takes the values 0.1, 0.3, 0.5, 0.7, and 0.9. 

(b) A former editor of the student newspaper offers 
to provide funds for a sample of size 500. Repeat the 
margin of error calculations in part (a) for the larger 
sample size. Then write a short thank-you note to 
the former editor describing how the larger sample 
size will improve the results of the survey. 


In responding to Exercises 22.34 to 22.41, follow the Plan, Solve, 
and Conclude steps of the four-step process. 


22.34 


College-educated parents. The National Assess- 
ment of Educational Progress (NAEP) includes a 
“long-term trend” study that tracks reading and 
mathematics skills over time, and obtains demo- 
graphic information. In the 2012 study, a random 
sample of 9000 17-year-old students was selected.”4 
The NAEP sample used a multistage design, but the 
overall effect is quite similar to an SRS of 17-year- 
olds who are still in school. 

(a) In the sample, 51% of students had at least one 
parent who was a college graduate. Estimate, with 
99% confidence, the proportion of all 17-year-old 


22.35 
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students in 2012 who had at least one parent gradu- 
ate from college. 

(b) The sample does not include 17-year-olds who 
dropped out of school, so your estimate is only 

valid for students. Do you think the proportion of 
all 17-year-olds with at least one parent who was 

a college graduate would be higher or lower than 
51%? Explain. 

Downloading music. A husband and wife, Ed and 
Rina, share a digital music player that has a feature 
that randomly selects which song to play. A total of 
3476 songs have been loaded into the player, some 
by Ed and the rest by Rina. They are interested in 
determining whether they have loaded different pro- 
portions of songs into the player. Suppose that when 
the player was in the random-selection mode, 34 
of the first 50 songs selected were songs loaded by 
Rina. Let p denote the proportion of songs that were 
loaded by Rina. 


(a) State the null and alternative hypotheses to be 
tested. How strong is the evidence that Ed and Rina 
have loaded different proportions of songs into the 
player? Make sure to check the conditions for the 
use of this test. 

(b) Are the conditions for the use of the large- 
sample confidence interval met? If so, estimate with 
95% confidence the proportion of songs that were 
loaded by Rina. 


The boomerang generation. The “boomerang gen- 
eration” is a term applied to the current generation 
of young adults. They are so named because of the 
frequency with which they choose to live with their 
parents after a brief period of living on their own, 
thus boomeranging back to their place of origin. A 
Pew Research Center survey of 808 adults nation- 
wide conducted December 6-19, 2011, found that 
39% of 18- to 34-year-olds lived with their parents or 
had moved back in temporarily because of the econ- 
omy in the past few years. At the time the poll was 
conducted, what can you say with 95% confidence 
about the percent of all 18- to 34-year-olds in the 
United States who lived with their parents or had 
moved back in temporarily because of the economy? 
Do cash incentives improve learning? A high- 
school teacher in a low-income urban school in 
Worcester, Massachusetts, used cash incentives to 
encourage student learning in his AP statistics class.” 
In 2010, 15 of the 61 students enrolled in his class 
scored a 5 on the AP statistics exam. Worldwide, the 
proportion of students who scored a 5 in 2010 was 
0.15. Is this evidence that the proportion of students 
who would score a 5 on the AP statistics exam when 
taught by the teacher in Worcester using cash incen- 
tives is higher than the worldwide proportion of 0.15? 
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(a) State hypotheses, find the P-value, and give 
your conclusions in the context of the problem. Do 
you have any reservations about using the z test for 
proportions for this data? 

(b) Does this study provide evidence that cash 
incentives cause an increase in the proportion of 5s 
on the AP statistics exam? Explain your answer. 


Final election poll. Gallup’s final presidential elec- 
tion poll was conducted November 1-4, 2012. The 
poll included a random sample of 2551 adults, aged 
18 and over, living in all 50 U.S. states and the Dis- 
trict of Columbia, who were identified as likely to 
vote. Forty-eight percent of these likely voters said 
they would vote for Barack Obama. Does this pro- 
vide evidence that, among all likely voters at the 
time the poll was conducted, less than half supported 
Barack Obama? 


Order in choice. Does the order in which wine is 
presented make a difference? Several choices of wine 
are presented one at a time and in sequence and the 
subject is then asked to choose the preferred wine 
at the end of the sequence. In this study, subjects 
were asked to taste two wine samples in sequence. 
Both samples given to a subject were the same wine, 
although subjects were expecting to taste two differ- 
ent samples of a particular variety. Of the 32 subjects 
in the study, 22 selected the wine presented first, 
when presented with two identical wine samples.” 


(a) Do the data give good reason to conclude that 
the subjects are not equally likely to choose either 
of the two positions when presented with two iden- 
tical wine samples in sequence? 

(b) The subjects were recruited in Ontario, Canada, 
via advertisements to participate in a study of “at- 
titudes and values towards wine.” Can we generalize 
our conclusions to all wine tasters? Explain. 


Chick-fil-A gets it right. Which fast food chain 
fills orders most accurately at the drive-thru win- 
dow? The Quick Service Restaurant (QSR) maga- 
zine drive-thru study visits restaurants in the largest 
fast food chains in all 50 states. All visits occurred 
during the lunch hours of 11:00 A.M. to 2:30 P.M. or 
during the dinner hours of 4:00 p.m. to 7:00 P.M. 
During each visit, the researcher ordered a main 
item, a side item, and a drink. One item was left off 
of each order; for example, a field researcher could 
order a sandwich with no pickles. After receiving 
the order, all food and drink items were checked for 
complete accuracy. Any food or drink item received 
that was not exactly as ordered resulted in the order 
being classified as inaccurate. Also included in 
the measurement of accuracy were condiments 
asked for, napkins, straws, and correct change. Any 
errors in these resulted in the order being classified 
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as inaccurate. In 2012, Chick-fil-A had the fewest 
inaccuracies, with only 23 of 274 orders classified 
as inaccurate.’” What proportion of orders are filled 
accurately by Chick-fil-A? (Use 95% confidence.) 


© Julie Dermansky/Corbis 


Order in choice: planning a study. How large a 
sample would be needed to obtain margin of error 
+0.05 in the study of choice order for tasting wine? 
Use the p from Exercise 22.39 as your guess for the 
unknown p. 


Exercises 22.42 to 22.44 concern the optional material on the plus 
four method. 
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Order in choice. Does the order in which wine is 
presented make a difference? Several choices of wine 
are presented one at a time and in sequence and the 
subject is then asked to choose the preferred wine 
at the end of the sequence. In this study, subjects 
were asked to taste two wine samples in sequence. 
Both samples given to a subject were the same wine, 
although subjects were expecting to taste two differ- 
ent samples of a particular variety. Of the 32 subjects 
in the study, 22 selected the wine presented first, 
when presented with two identical wine samples.” 


(a) Although the conditions for the large-sample 
test were met in Exercise 22.39, show that the 
conditions for the large-sample confidence interval 
discussed in this chapter are not met. 

(b) Are the conditions for the use of the plus four 
confidence interval met? If so, use the plus four 
method to give a 90% confidence interval for the 
proportion of subjects who would select the first 
choice presented. 


Shrubs that survive fires. Some shrubs have the 
useful ability to resprout from their roots after their 
tops are destroyed. Fire is a particular threat to shrubs 
in dry climates, as it can injure the roots as well as 
destroy the aboveground material. One study of 
resprouting took place in a dry area of Mexico.” The 
investigators clipped the tops of samples of several 
species of shrubs. In some cases, they also applied 
a propane torch to the stumps to simulate a fire. Of 
12 specimens of the shrub Krameria cytisoides, five 
resprouted after fire. Estimate with 90% confidence 
the proportion of all shrubs of this species that will 
resprout after fire. 
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Prayer among the Millennials. The Millennial gen- 
eration (so called because they were born after 1980 
and began to come of age around the year 2000) are 
less religiously active than older Americans. One of 
the questions in the General Social Survey in 2012 
was, “How often does the respondent pray?” Among 
the 457 respondents in the survey between 18 and 
32 years of age, 289 prayed at least once a week.*° 
Assume that the sample is an SRS. 
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the large-sample 99% confidence interval for the 
proportion p of all adults between 18 and 32 years 
of age who pray at least once a week? 

(b) Give the plus four 99% confidence interval 
for p. If you express the two intervals in percents 
and round to the nearest tenth of a percent, how 
do they differ? (As always, the plus four method 
pulls results away from 0% or 100%, whichever is 
closer.) 


(a) Verify that the sample size conditions are met 
for the large-sample confidence interval. What is 
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Where are the smokers? The Behavioral Risk Factor Surveillance System 
(BRFSS) is an ongoing data collection program designed to measure behavioral 
risk factors for the adult population (18 years of age or older) living in households. 
Data are collected from a random sample of adults (one per household) through 
a telephone survey. Go to the website apps.nccd.cdc.gov/BRFSS/ and 
under Category go to Tobacco Use. Under the topic Adults Who Are Current 
Smokers, you will find the percentage of smokers in each state. 


(a) Which state has the highest percentage of smokers, and what is the 
reported value? Which state has the lowest percentage, and what is its value? Are 
the reported percentages statistics or parameters? 


(b) Choose a state of interest to you and click on the link. In the table that 
opens, there is a line for n and the entries are the numbers who answered “yes” 
and “no.” Find the percentage in the sample who answered “yes.” Notice that 
it is different than the percentage reported in the table. The table estimates are 
weighted to try to reduce the bias. If it is determined that certain portions of the 
population are underrepresented in the sample, then that portion of the sample 
receives more weight when computing the estimate of the percentage. The as- 
sumptions for an SRS are rarely met in practice, and more complicated methods 
are often necessary to estimate proportions. 


More on weighting of estimates. The website for the American Association 
for Public Opinion Research discusses several issues about polls. Read the discus- 
sion of weighting at http: //www.aapor.org/Weightingl.htm. The 
necessity of weighting arises in Exercise 22.28 (page 534). Here are the important 
points. 

A state representative wants to estimate the proportion of voters in her dis- 
trict who support a bill legalizing same-sex marriage. In her district, it is known 
that 54% of the registered voters are female. She mails a survey to an SRS of 1300 
registered voters, and 800 surveys are returned. Among the 800 surveys returned, 
66% were from women, and the women tended to support the bill more than the 
men. Because the women are both overrepresented in the sample compared with 
the population of registered voters and have stronger support for the bill, the 
proportion in the sample that support the bill will tend to be larger than in the 
population, producing bias in the estimate. 


(a) The online discussion of weighting gives three uses of weighting to adjust 
poll results. Which of these three uses is relevant in this situation? 
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Here is a breakdown of the respondents. 


SUPPORT THE BILL 
YES NO TOTAL 


Females 422 106 528 
Males 154 118 272 
Total 576 224 800 


(b) What is the proportion of females in the sample, fy, that support the bill 
and the proportion of males in the sample, fy, that support the bill? 

(c) If the sample proportions f; and fy were equal to the true proportions of 
females and males that support the bill, what would be the true proportion of all 
registered voters in the population who support the bill? This is a weighted esti- 
mator. (Hint: You will need to use the fact that 54% of the population is female 
and 46% is male in your computation.) 

(d) The weighted estimator is smaller than the unweighted estimator 
pf = 576/800 = 0.72. Why is this the correct direction to adjust the estimator? 
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23.1 Two-sample problems: 
proportions 


= (O DO tl O ) S 23.2. The sampling distribution 


of a difference between 
proportions 


23.3 Large-sample confidence 
intervals for comparing 
proportions 


23.4 Using technology 


two-sample problem can arise from a randomized comparative experiment 23.5 Significance tests for 

that randomly divides subjects into two groups and exposes each group to a comparing proportions 

different treatment. Comparing random samples separately selected from two 23.6 Plus four confidence intervals 
populations is also a two-sample problem. When the comparison involves the means for comparing proportions* 


of two populations, we use the two-sample t methods of Chapter 21. In this chapter, 
we consider two-sample problems in which the measurement on the individual can 
be categorized as either a success or a failure. Our goal is to compare the proportions 
of successes in the two populations. 


23.1 Two-sample problems: proportions 


Here are some questions that you will answer in the exercises of this chapter. 


s for Proportions Tei 


Does the proportion of males who have used the Internet to obtain information 
about a specific disease or medical problem in the last year differ from the 
proportion of females who have done so? A sample of males and a sample of 
females are obtained, and the proportions in each sample who have used the 
Internet to obtain information about a specific disease or medical problem in the 
last year are to be compared (see Exercise 23.1). 
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Does hand washing with alcohol-based hand sanitizers reduce the risk of infection 
for the common cold? A randomized experiment assigned 100 subjects to a 
treatment group that followed a regimen of hand washing with alcohol-based 
sanitizers and 100 subjects to a control group that used routine hand washing 
without alcohol-based sanitizers. After 10 weeks, compare the proportions in the 
two groups who were infected with the common cold virus over the study period 
(see Exercise 23.35). m 


BQ computer- 
A ASSISTED 
INTERVIEWING 


The days of the interviewer with a 
clipboard are past. Interviewers now 
read questions from a computer 
screen and use the keyboard to 

enter responses. The computer skips 
irrelevant items—once a woman 

says that she has no children, further 
questions about her children never 
appear. The computer can even 
present questions in random order to 
avoid bias due to always following the 
same order. Software keeps records of 
who has responded and prepares a file 
of data from the responses. The tedious 
process of transferring responses from 
paper to computer, once a source of 
errors, has disappeared. 


We will use notation similar to that used in our study of two-sample t statistics. 
The groups we want to compare are Population 1 and Population 2. We have a sepa- 
rate SRS from each population or responses form two treatments in a randomized 
comparative experiment. A subscript shows which group a parameter or statistic 
describes. Here is our notation: 


Population Sample Sample 
Population Proportion Size Proportion 
1 Pi ny Di 
2 p2 0) p2 


We compare the populations by doing inference about the difference p; — p? 
between the population proportions. The statistic that estimates this difference is 
the difference between the two sample proportions, fp; — p>. 
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STATE? “Would you date a person of a different race?” Researchers answered this 
question by collecting data from the Internet dating site Match.com. When people 
post profiles on the site, they indicate which races they are willing to date. While sev- 
eral races were studied, we focus on the data collected for black people dating white 
people. A random sample of 100 black males and a random sample of 100 black 
females were selected from the dating site, with 75 of the black males indicating 
their willingness to date white females and 56 of the black females indicating their 
willingness to date white males.' Is this good evidence that different proportions of 
black males and females on this Internet dating site would be willing to date someone 
who is white? How large is the difference between the proportions of black males 
and females who would be willing to date someone who is white? 


PLAN: Take black males to be Population 1 and black females to be Population 2. 
The population proportions who are willing to date someone who is white are p, for 
black males and pz for black females. We want to test the hypotheses 


= ps = 0) 
H,: P} % P2 (the same as H,: p; — p2 # O) 


Ho: Pp: = Po (the same as Ho: p; 


We also want to give a confidence interval for the difference p,; — po. 


ESQ RVI Inference about population proportions is based on the sample proportions 


R 75 

Pi= 400 = 0.75 (men) 

rn 56 

P2 = —~=0.56 (women) 
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We see that 75% of black males but only 56% of black females would be willing to 
date someone who is white. Because the sample sizes are of moderate size and the 
sample proportions are quite different, we expect that a test will be highly significant 
(in fact, P = 0.0046). So we concentrate on the confidence interval. To estimate 
P; — P2, Start from the difference between sample proportions 


py — Pp = 0.75 — 0,56 = 0.19 


To complete the Solve step, we must know how this difference behaves. m 


To use fp; — p> for inference, we must know its sampling distribution. Here are 
the facts we need: 


= When the samples are large, the distribution of ), — p2 is approximately 
Normal. 


= The mean of the sampling distribution is p; — p;. That is, the difference 
between sample proportions is an unbiased estimator of the difference 
between population proportions. 


» The standard deviation of the distribution is 


fee — pi) 4: pa(1 — p2) 


ny nN 


Figure 23.1 displays the distribution of f,; — fp). The standard deviation of 
fh, — p2 involves the unknown parameters p; and p). Just as in Chapter 22, we must 
replace these by estimates to make inferences. And just as in Chapter 22, we do this 
a bit differently for confidence intervals and for hypothesis tests. 


Standard deviation 


Sampling distribution 
of By — py [ae rs p2(1— pr) 


= Mm ny 


Mean p;— p> 


A aA 
<—_ Values of p; -— pp ———> 


FIGURE 23.1 

Select independent SRSs from two populations having proportions of successes p, and p>. The proportions 
of successes in the two samples are p; and p>. When the samples are large, the sampling distribution of the 
difference p,— P2 is approximately Normal. 
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standard error 


To obtain a confidence interval, replace the population proportions p, and p, in 
the standard deviation by the sample proportions. The result is the standard error 


of the statistic p; — po: 


TTF TTF 
Bee fe a) i, pol po) 


ny nN 


The confidence interval has the same form we met in Chapter 22: 


estimate © 7*SEccimate 


Large-Sample Confidence Interval 


for Comparing Two Proportions 


Draw an SRS of size n; from a large population having proportion p, of successes, and 
draw an independent SRS of size n from another large population having propor- 
tion p2 of successes. When n; and n; are large, an approximate level C confidence 
interval for p, — pz is 


(bi — Br) + x*SE5,-5, 
In this formula, the standard error SE;,-», of f: — po is 


TS: at 
Sees (2 bi) ii p21 — p2) 


ny nN 


and z* is the critical value for the standard Normal density curve with area C between 
=z" andiz*. 

Use this interval only when the numbers of successes and failures are each 10 or more 
in both samples. 


We can now complete Example 23.2. Here is a summary of the basic information: 


Population Sample Number of Sample 
Population Description Size Successes Proportion 
1 black males n, = 100 75 fb, = 75/100 = 0.75 
2 black females n, = 100 56 fp. = 56/100 = 0.56 


EONS We will give a 95% confidence interval for p; — p2, the difference between 
the proportions of black males and black females who would be willing to date some- 
one who is white. To check that the large-sample confidence interval is safe to use, 
look at the counts of successes and failures in the two samples. All these four counts 
are larger than 10, so the large-sample method will be accurate. The standard error is 


~7 = <7 5 
Shine a p1) 4: Pol Po) 


ny No 
7 = (0.56)(0.44) 
100. 100 


= V 0.004339 = 0.0659 
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The 95% confidence interval is 
(6; — Po) + 2*SE;_s, = (0.75 — 0.56) + (1.960)(0.0659) 


pi—Pe 


= 0.19 + 0.13 
= 0.06 to 0.32 


(SONS We are 95% confident that the percent of black males willing to date 
white women is between 6 and 32 percentage points higher than the percent of 
black females who are willing to date white men on comparable Internet dating 
sites. Even with sample sizes of 100 in each group, the resulting confidence interval 
0.06 to 0.32 is quite wide. As with a single proportion, fairly large sample sizes are 
required to obtain narrow confidence intervals. In a similar study by the authors, it 
was found that white men were more willing than white women to date someone who 
is black. Note that our conclusions are restricted to comparable Internet dating sites 
and don’t necessarily reflect dating behavior of the general population, as individuals 
on dating sites may not be representative of the general population. = 


Figure 23.2 displays software output for Example 23.3 from a graphing calcula- 
tor and two statistical software programs. As usual, you can understand the output 
even without knowledge of the program that produced it. Minitab gives the test as 


Graphing Calculator Minitab 


@ Session _-|5) x! 


Test and Cl for Two Proportions 


je-FroreInt] 
€.0609;.3191) 
Pi=.75 


= 
Sample xX N Sample p 
1 75 100 0.750000 
2 56 100 0.560000 


Difference = p (1) - p (2) 

Estimate for difference: 0.19 

95% CI for difference: (0.0608950, 0.319105) 

Test for difference = 0 (vs not = 0): Z = 2.83 P-Value = 0.005 


Crunchlt! 


Results - Proportion 2-Sample “|X 


| | Results - Proportion 2-Sample *\|x 


Export + 


Pe fons [ove] 


0.1900 


Exporty 


Null hypothesis: Difference of proportion = 0 


Alternative hypothesis: Difference of proportion is not 0 


De ae JP 
Sample 1 0.7500 
Sample 2 0.5600 


Difference: 0.1900 


Difference: 


StdErr: 0.06587 


95% Conflint: (0.06090, 0.3191) 


PooledStdErr: 0.06723 


z Statistic: 2.826 FIGURE 23.2 
Output from a graphing calculator, 
Minitab, and Crunchlit! for the 95% 
confidence interval of Example 23.3. 


P-value: 0.004710 
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FAM THE COOKIE 
Ui STRIKES 


How many different people clicked 
on your business website last month? 
Technology tries to help: when 
someone visits your site, a little piece 
of code called a cookie is left on their 
computer. When the same person 
clicks again, the cookie says not to 
count them as a “unique visitor” 
because this isn’t their first visit. 

But lots of web users delete cookies, 
either by hand or automatically with 
software. These people get counted 
again when they visit your site again. 
That’s bias: your counts of unique 
visitors are systematically too high. 
One study found that unique-visitor 
counts were as much as 50% too high. 
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well as the confidence interval, confirming that the difference between men and 
women is highly significant. In Crunchlt!, the test and the confidence interval must 
be requested using separate commands, resulting in the two outputs in the figure. 
Excel spreadsheet output is not shown because Excel lacks menu items for inference 
about proportions. You must use the spreadsheet’s formula capability to program 
the confidence interval or test statistic and then to find the P-value of a test. JMP 
uses a slightly different formula than those given in the text for both the confidence 
interval and the test and the JMP output is not presented. 


() LaunchPad Online Resources 


* The Snapshots video, Comparing Two Proportions, provides an example in 
which the comparison of two proportions is appropriate, and gives some 
details for computing the large-sample confidence interval. 


* The StatBoards video, Confidence Intervals for Comparing Two Proportions, 
provides an additional example of computing the large-sample confidence 
interval which includes the specifics for computing the interval. 


Apply Your Knowledge 


23.1 


23.2 


23.3 


Health Online. In 2012, a random sample of 2392 Internet users found ap- 
proximately 55% had used the Internet to obtain health information about 
a specific disease or medical problem in the last year. Of the 1084 men in 
the survey, 520 used the Internet to learn about a specific disease or medical 
problem, whereas 811 of the 1308 women in the survey had done so.’ Give a 
95% confidence interval for the difference between the proportions of male 
and female Internet users who used the Internet to obtain health information 
about a specific disease or medical problem in the last year. Follow the four- 
step process as illustrated in Examples 23.2 and 23.3. 


Marijuana Use in High School. The Youth Risk Behavior Surveillance 
System (YRBSS) monitors health risk behaviors among U.S. high school stu- 
dents, which include tobacco use, alcohol and drug use, inadequate physi- 
cal activity, unhealthy diet, and risky sexual behavior. Approximately 9% of 
youth have tried marijuana before reaching age 13 years. How does mari- 
juana use change over the high school years? In 2013, the survey randomly 
selected 3500 ninth-graders and 3497 twelfth-graders and asked them if they 
had used marijuana one or more times during the previous 30 days.’ Of these 
students, 620 ninth-graders and 969 twelfth-graders said “yes.” Give a 99% 
confidence interval for the difference between the proportions of all ninth- 
and twelfth-grade high school students who smoked marijuana at least once 
in the 30 days prior to the survey. Follow the four-step process as illustrated 
in Examples 23.2 and 23.3. 


A Question on the Environment. In 2010, respondents to the Gen- 
eral Social Survey were asked how much they agreed or disagreed with 
the statement: “Many of the claims about the environment are exagger- 
ated.”* Among the 251 respondents aged 18 to 29, 75 said either “agree” 
or “strongly agree,” whereas among the 376 respondents over age 60, 174 
gave one of these responses. Give a 95% confidence interval for the differ- 
ence between the proportion of people aged 18 to 29 and the proportion of 
people over 60 who feel that many of the claims about the environment are 
exaggerated. Follow the four-step process as illustrated in Examples 23.2 
and 23.3. 


23.5 Significance Tests for Comparing Proportions 


An observed difference between two sample proportions can reflect an actual 
difference between the populations, or it may just be due to chance variation in ran- 
dom sampling. Significance tests help us decide if the effect we see in the samples is 
really there in the populations. The null hypothesis says that there is no difference 
between the two populations: 


Ho: pi = p2 (the same as Hp: p; — p2 = 0) 


The alternative hypothesis says what kind of difference we expect. 


(STATE? The political event depicted in Figure 23.3 was remembered by about 31% of 
those surveyed, despite the fact that it never occurred. In 2010, S/ate, a current affairs 
and culture magazine, surveyed over 5000 readers regarding their perspective on 
several past political events and their memories of these events. Unbeknownst to those 
surveyed, one of the events shown to each participant was fabricated, making this the 
largest false memory study conducted to date.® The hypothesis of interest was whether 
political preferences guided the formation of false memories, as false memories have 
been shown to be more easily implanted in memory when they are congruent with a 
person's preexisting attitudes. Figure 23.3 was viewed by 616 participants who cat- 
egorized themselves as progressive and 49 participants who categorized themselves 
as conservative. The event was falsely remembered as having occurred by 212 of the 
progressives surveyed and by 7 of the conservatives. How strong is the evidence that a 
larger proportion of progressives have a false memory of this event than conservatives? 


“PLAN: Call the population proportions p, for progressives and p2 for conservatives. 
Because the i image of a conservative president relaxing at his ranch during a major crisis 
is more congruent with the preexisting attitude of progressives, our hypothesis gives a 
direction for the difference before looking at the data, so we have the one-sided alternative: 


Ho: Pi = Pez 
Hy: p1 > Pz 


ESQNVAI Consider those who classify themselves as progressives and conservatives 
as separate SRSs of progressive and conservative Slate readers. The sample propor- 
tions who falsely remembered the event in Figure 23.3 are 


ae 0.344 (progressives) 
Py 616 . prog 


Pp2 =—~ = 0.143 (conservatives) 

49 
That is, 34% of those classifying themselves as progressive falsely remembered the 
event in Figure 23.3, but only 14% of conservatives falsely remembered it. Is this 
apparent difference statistically significant? To continue the solution, we must learn 
the proper test. m 


To do a hypothesis test, standardize the difference between the sample propor- 
tions f,; — py» to get a statistic. If Hy is true, both samples come from populations 
in which the same unknown proportion p have a false memory of the event depicted 
in Figure 23.3. We take advantage of this by combining the two samples to estimate 


FIGURE 23.3 
September 1, 2005: As parts of New 
Orleans lie underwater in the wake 

of Hurricane Katrina, President Bush 


entertains 


Houston Astros pitcher 


Roger Clemens at his ranch in 


Crawford, 
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Photo illustration by Holly Allen 
(Slate Magazine) 


Texas, for Example 23.4. 
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pooled sample this single p instead of estimating p; and p) separately. Call this the pooled sample 
proportion proportion. It is 


a number of successes in both samples combined 


number of individuals in both samples combined 


Substituting p in place of both fh, and fp) in the expression for the standard error 
SE;,-¢, of f; — pz, we get a z statistic that has the standard Normal distribution 
when Hp is true. Here is the test. 


Significance Test for Comparing Two Proportions 


Draw an SRS of size n; from a large population having proportion p, of successes and 
draw an independent SRS of size n; from another large population having proportion 
p2 of successes. To test the hypothesis Ho: p; = pz, first find the pooled proportion 
p of successes in both samples combined. Then compute the z statistic 


bi — bo 


- foo -a(t +4) 


In terms of a variable Z having the standard Normal distribution, the P-value for a 
test of Hy against 


Hep =p is PZ =) A X 
Hep = py 1s F(Z = 2) y | XS 
Dy ey es 212 =e) | N 


Use this test when the counts of successes and failures are each five or more in both 
samples.° 


ESRI The data come from an SRS, and the counts of successes and failures are 
all larger than five. The pooled proportion of progressives and conservatives who 
falsely remembered this event is 


, number “falsely remembered event” among progressives and conservatives combined 
p — 


number of progressives and conservatives combined 
212+ 7 

~ 616 + 49 

219 


=~ = 0.329 
665 
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The z test statistic is 
ps = Po 


Vee a+) 


0.344 — 0.143 


1 1 
329) 0.671) — 
{es ee er +) 
0.201 
= =2, 
0.0697 si 


The one-sided P-value is the area under the standard Normal curve greater than 2.88. 
Figure 23.4 shows this area. Software tells us that P = 0.00199. 


Standard Normal 
curve P-value is the area 
under the curve to 

the right of z = 2.88 


eS 2.38 FIGURE 23.4 
The P-value for the one-sided test of 
Example 23.5. 


Without software, you can compare z = 2.88 with the bottom row of Table C (standard s# 2.807 3.091 
Normal critical values) to approximate P. - 

It lies between the critical values 2.807 and 3.091 for one-sided P-values 0.0025 One-sided P_— 0.0025 0.001 
and 0.001. 


| CONCLUDE: Silitarems strong evidence (P < 0.0025) that, among Slate readers, 
progressives are more likely than conservatives to have a false memory of Mr. Bush 
vacationing with Roger Clemens in the wake of the Katrina hurricane. In a second 
fabricated event by the authors, it was found that conservatives were more likely than 
progressives to falsely remember President Obama shaking hands with former Iranian 
President Ahmadinejad at a United Nations conference. m 


The sample survey in this example selected a single sample of Slate readers, not 
two separate samples of progressives and conservatives. To get two samples, we 
divided the single sample by political orientation. This means that we did not know 
the two sample sizes n, and n, until after the data were in hand. The two-sample zx 
procedures for comparing proportions are valid in such situations. This is an impor 
tant fact about these methods. 


(4) LaunchPad Online Resources 


* The StatBoards video, Hypothesis Testing for a Difference in Proportions, 
provides an additional example of the large-sample test for comparing two 
proportions. 
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Apply Your Knowledge 


23.4 Seat Belt Use. The proportion of drivers who use seat belts depends on 

4s factors such as age (young people are more likely to go unbelted) and sex 
(women are more likely to use belts). It also depends on local law. In New 

York City, police can stop a driver who is not belted. In Boston at the time 

of the survey, police could cite a driver for not wearing a seat belt only if 

the driver had been stopped for some other violation. Here are data from 

observing random samples of female Hispanic drivers in these two cities 


in 2002: 
City Drivers Belted 
New York 220 183 
Boston 117 68 


(a) Is this an experiment or an observational study? Why? 


(b) Comparing local laws suggests the hypothesis that a smaller proportion 
of drivers wear seat belts in Boston than in New York. Do the data give 
good evidence that this is true for female Hispanic drivers? Follow the 
four-step process as illustrated in Examples 23.4 and 23.5. 


x 
(=) 
8 | 
3 
8 
<2 
2 
SH 
3m 
Z 
= 
8 
a 
=| 
ot 


23.5 Protecting Skiers and Snowboarders. Most alpine skiers and snow- 
boarders do not use helmets. Do helmets reduce the risk of head injuries? 
A study in Norway compared skiers and snowboarders who suffered head 
injuries with a control group who were not injured. Of 578 injured subjects, 
96 had worn a helmet. Of the 2992 in the control group, 656 wore helmets.§ 
Is helmet use less common among skiers and snowboarders who have head 
injuries? Follow the four-step process as illustrated in Examples 23.4 and 
23.5. (Note that this is an observational study that compares injured and 
uninjured subjects. An experiment that assigned subjects to helmet and no- 
helmet groups would be more convincing.) 


surgeons remove two or three lymph nodes close to the breast that are 
most likely to contain cancer cells. If these “sentinel” lymph nodes are free 
of tumors, SLND alone is the accepted management for patients. When 
the sentinel lymph nodes contain metastases, axillary lymph node dissec- 
tion (ALND, or the removal of further nodes) remains the standard of 
care, although its contribution to survival is controversial. ALND car- 
ries the additional risk of complications such as seroma, infection, and 
lymphedema. In one study, patients with sentinel metastases identified 
by SLND were randomized to undergo ALND or no further treatment 
(SLND alone). Here are the five-year disease-free survival numbers for 
the two groups:? 


4. 23.6 Breast Cancer Treatment. In sentinel lymph node dissection (SLND), 


Group Sample Size Disease-Free after Five Years 
ALND 420 345 
SLND alone 436 366 


How strong is the evidence that the proportion of disease-free patients after 
five years for patients who underwent ALND in addition to SLND differs 
from that of those who underwent SLND only? Follow the four-step process 
as illustrated in Examples 23.4 and 23.5. 
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Like the large-sample confidence interval for a single proportion p, the large- 

A sample interval for p; — pz generally has a true confidence level less than the 

level you asked for. The inaccuracy is not as serious as in the one-sample 

case, at least if our guidelines for use are followed. Once again, adding imaginary 
observations can improve the accuracy.!° 


Plus Four Confidence Interval 


for Comparing Two Proportions 


Draw independent SRSs from two large populations with population proportions 
of successes p; and p>. To get the plus four confidence interval for the difference 
P1 — P2, add four imaginary observations: one success, and one failure in each of the 
two samples. Then use the large-sample confidence interval with the new sample 
sizes (actual sample sizes + 2) and counts of successes (actual counts + 1). 


Use this interval when the sample size is at least 5 in each group, with any counts of 
successes and failures. 


If your software does not offer the plus four method, just enter the new plus four 
sample sizes and success counts into the large-sample procedure. 


effects of intensive early childhood education on children who were at high risk based 
on several sociodemographic indicators. The project randomly assigned children to a 
treatment group, which was provided with early educational activities before kinder- 
garten, and the remainder to a control group. A recent follow-up study interviewed 
subjects at age 30 and compared the college graduation rates (earning a four-year 
degree).'' Here are the data for two groups: 


(STATE! The Abecedarian Project was a randomized controlled study to assess the 4. 


Population Sample Number of Sample 
Population Description Size Successes Proportion 
1 treatment ny = 52 12 py = 12/52 = 0.2308 
2 control | = 49 3 i/o 0.0612 


How much does early childhood education increase the proportion earning a four-year 
degree by age 30? 

“PLAN: Give a 90% confidence interval for the difference between population pro- 
portions, Pp; — Po. 


ESQRVS The conditions for the large-sample interval are not met because there are 
only three successes in the control group. However, we can use the plus four method 
because the sample sizes for the treatment and the control group are both at least 
5. Add four imaginary observations. The new data summary is 


*This section is optional. 
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Population Sample Number of Plus Four Sample 
Population Description Size Successes Proportion 
1 treatment n+2=54 12+1=13_ p, = 13/54 = 0.2407 
2 control mt+2=51 3+1= 4 p,= 4/51 = 0.0784 
The standard error based on the new facts is 
pi _ P:) pat = Po) 
E;,-», = + 
SEA, = n> + 2 
——- % (0.0784)(0.9216) 
7 54 


= V 0.00480 = 0.0693 


The plus four 90% confidence interval is 
(6, — B2) + Z2*SE;,-;, = (0.2407 — 0.0784) + (1.645)(0.0693) 
= 0.1623 + 0.1140 
= 0.048 to 0.276 


ESOMSEUDA We are 90% confident that the early childhood education program 
increases the proportion of high-risk children who earn a four-year degree by age 30 
by between 4.8% and 27.6%. 


The plus four interval may be conservative (that is, the true confidence level may 
be higher than you asked for) for very small samples and population p values close to 
O or 1. It is generally much more accurate than the large-sample interval when the 
samples are small. Nevertheless, the plus four interval in Example 23.6 cannot save 
us from the fact that sample sizes around 50 will produce a wide confidence interval 
when comparing two proportions. 


Apply Your Knowledge 


23.7 In-Line Skaters. A study of injuries to in-line skaters used data from the 
National Electronic Injury Surveillance System, which collects data from a 
random sample of hospital emergency rooms. The researchers interviewed 
161 people who came to emergency rooms with injuries from in-line skating. 
Wrist injuries (mostly fractures) were the most common.” 


(a) The interviews found that 53 people were wearing wrist guards, and 
6 of these had wrist injuries. Of the 108 who did not wear wrist guards, 
45 had wrist injuries. Why should we not use the large-sample confi- 
dence interval for these data? 


(b) The plus four method adds one success and one failure in each sample. 
What are the sample sizes and counts of successes after you do this? 


© Jupiterimages/Comstock Images/Alamy 


(c) Give the plus four 95% confidence interval for the difference between 
the two population proportions of wrist injuries. State carefully what 
populations your inference compares. 


23.8 Shrubs that Withstand Fire. Fire is a serious threat to shrubs in dry 
climates. Some shrubs can resprout from their roots after their tops are 
destroyed. One study of resprouting took place in a dry area of Mexico. 
The investigators first clipped the tops of all the shrubs in the study. For the 


treatment, they then applied a propane torch to the stumps of the shrubs 
to simulate a fire, and for the control the stumps were left alone. The study 
included 24 shrubs of which 12 were randomly assigned to the treatment and 
the remaining 12 to the control. A shrub is a success if it resprouts. For the 
shrub Xerospirea hartwegiana, all 12 shrubs in the control group resprouted, 
whereas only 8 in the treatment group resprouted. How much does burning 
reduce the proportion of shrubs of this species that resprout? Give the 95% 
plus four confidence interval for the amount by which burning reduces the 
proportion of shrubs of this species that resprout. The plus four method is 
particularly helpful when, as here, a count of either successes or failures is 
zero. Follow the four-step process as illustrated in Example 23.6. 


CHAPTER 23 SUMMARY 


Chapter Specifics 


The data in a two-sample problem are two independent SRSs, each drawn from a 
separate population. 


Tests and confidence intervals to compare the proportions p, and p, of successes in the 
two populations are based on the difference f, — fp; between the sample proportions 
of successes in the two SRSs. 


When the sample sizes n, and n; are large, the sampling distribution of f; — f is close 
to Normal with mean p, — p. 


The level C large-sample confidence interval for p,; — pz is 
(Oy = Pa) BaP SE yg 
where the standard error of f, — fy is 


Tr Ae Yee: 
SE; 4 - ve Bi) , Bol ~ by) 


ny nN 


and z* is a standard Normal critical value. 


The true confidence level of the large-sample interval can be substantially less than the 
planned level C. Use this interval only if the counts of successes and failures in both 
samples are 10 or greater. 


Significance tests for Ho: p; = pz use the pooled sample proportion 


‘ number of successes in both samples combined 


number of individuals in both samples combined 


and the z statistic 


P-values come from the standard Normal distribution. Use this test when there are five 
or more successes and five or more failures in both samples. 


(Optional topic) When the conditions for the large-sample confidence interval are 
not met, to get a more accurate confidence interval, add four imaginary observations, 
one success and one failure in each sample. Then use the same formula for the confi- 
dence interval. This is the plus four confidence interval. You can use it whenever both 
samples have five or more observations. 


Summary 
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Link It 


Most studies compare two or more treatments rather than investigating a single treatment. 
These can be observational studies or comparative experiments, and the differences in the 
types of conclusions that can be reached were described in Chapter 9. This chapter consid- 
ers the case where the response classifies individuals into two categories, such as black peo- 
ple being willing to date someone who is white or not. The comparison is then a comparison 
of this proportion for two groups, such as the proportion of black males who are willing to 
date someone who is white versus the proportion of black females who are willing to date 
someone who is white. Because the methods of this chapter are approximate, it is important 
always to check the conditions required for the approximations to work well, whether you 
are using the large-sample confidence interval, the z test, or the plus four method. 

After the appropriate statistical method has been applied, care must be taken when 
stating the conclusion. The issues described in Chapter 9 are still important. For example, 
a lack of blinding can result in the expectations of the researcher influencing the results, 
whereas confounding can mix up the comparison of the two groups with other factors. 

When there are two treatments and the response is a quantitative measurement, the 
comparison between the treatments is often based on a comparison of the treatment means. 
These methods were described in Chapter 21. 


(4) LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this 
chapter. 


* StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 


Exercises 
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| CHECK YOUR SKILLS 


In 2012, respondents to the General Social Survey were asked: 
“There are always some people whose ideas are considered bad or 
dangerous by other people. For instance, somebody who is against 
churches and religion. If such a person wanted to make a speech in 
your (city/town/community) against churches and religion, should 
he be allowed to speak, or not?” 


Among the 573 male respondents, 459 said, “allow,” whereas 
among the 719 female respondents, 552 said, “allow.” Is there a 
difference between the proportion of males and the proportion of 
females who would allow an anti-religionist to speak? Exercises 23.9 
to 23.13 are based on these results. 


23.9 Take py and p; to be the proportions of all males and 
females who would allow an anti-religionist to speak. 
The hypotheses to be tested are 
(a) Ho: py = pp versus H,: py ¥ pr. 
(b) Ho: pm = pe versus H,: py > Dr. 
(c) Ho: pq = pr versus H,: py < pe. 
23.10 The sample proportions of males and females who 
would allow an anti-religionist to speak are 
(a) fy = 0.801 and p; = 0.768. 
(b) py = 0.832 and pp = 0.797. 
(c) py = 0.832 and f; = 0.801. 
23.11 The pooled sample proportion of respondents who 
would allow an anti-religionist to speak is 
(a) b = 0.768. (b) fb = 0.783. (c) p = 0.801. 
23.12 The z test for comparing the proportions of males 
and females who would allow an anti-religionist to 
speak has 
(QI? =< Web 
(b) 0.05 < P < 0.10. 
P= 010) 


23.13 


23.14 


23.15 


23.16 


The 90% large-sample confidence interval for the 
difference py — py in the proportions of males and 
females who would allow an anti-religionist to speak 
is about 


(a) 0.033 + 0.002. 
(b) 0.033 + 0.038. 
(c) 0.033 + 0.045. 


In an experiment to learn if substance M can help re- 
store memory, the brains of 20 rats were treated to dam- 
age their memories. The rats were trained to run a maze. 
After a day, 10 rats were given M, and seven of them 
succeeded in the maze; only two of the 10 control rats 
were successful. The z test for “no difference” against “a 
higher proportion of the M group succeeds” has 


(Qi oe 01025 
(b) z = 2.60, P < 0.005. 
(c) z = 2.25, 0.02 < P < 0.04. 


The z test in Exercise 23.14 


(a) may be inaccurate because the populations are 
too small. 

(b) may be inaccurate because some counts of 
successes and failures are too small. 

(c) is reasonably accurate because the conditions 
for inference are met. 


(Optional topic) The plus four 90% confidence in- 
terval for the difference between the proportion of 
rats that succeed when given M and the proportion 
that succeed without it is 


(a) 0.455 = 0.312. 
(b) 0.417 + 0.304. 
(c) 0.417 + 0.185. 


CHAPTER 238 EXERCISES 


When using the large-sample methods of this chapter, be sure to 
check that the guidelines for their use are met, and to state your 
conclusions in context. 


23.17 Truthfulness in online profiles. Many teens have 
posted profiles on a social-networking website. A 
sample survey in 2007 asked a random sample of 
teens with online profiles if they included false infor- 
mation in their profiles. Of 170 younger teens (aged 
12 to 14), 117 said “yes.” Of 317 older teens (aged 15 
to 17), 152 said “yes.” © 


(a) Do these samples satisfy the guidelines for the 
large-sample confidence interval? 

(b) Give a 95% confidence interval for the differ- 
ence between the proportions of younger and older 


23.18 


teens who include false information in their online 
profiles. 


Effects of an appetite suppressant. Subjects with 
pre-existing cardiovascular symptoms who were re- 
ceiving sibutramine, an appetite suppressant, were 
found to be at increased risk of cardiovascular events 
while taking the drug. The study included 9804 
overweight or obese subjects with pre-existing car- 
diovascular disease and/or type 2 diabetes. The sub- 
jects were randomly assigned to sibutramine (4906 
subjects) or a placebo (4898 subjects) in a double- 
blind fashion. The primary outcome measured was 
the occurrence of any of the following events: non- 
fatal myocardial infarction or stroke, resuscitation 
after cardiac arrest, or cardiovascular death. The 
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primary outcome was observed in 561 subjects in the 
sibutramine group and 490 subjects in the placebo 
group.!° 

(a) Find the proportion of subjects experiencing 

the primary outcome for both the sibutramine and 
placebo groups. 

(b) Can we safely use the large-sample confidence 
interval for comparing the proportions of sibutra- 
mine and placebo subjects who experienced the 
primary outcome? Explain. 

(c) Give a 95% confidence interval for the differ- 
ence between the proportions of sibutramine and 
placebo subjects who experienced the primary 
outcome. 


(Optional topic) Genetically altered mice. Genet- 
ic influences on cancer can be studied by manipulat- 
ing the genetic makeup of mice. One of the processes 
that turn genes on or off (so to speak) in particular 
locations is called “DNA methylation.” Do low lev- 
els of this process help cause tumors? Compare mice 
altered to have low levels with normal mice. Of 33 
mice with lowered levels of DNA methylation, 23 
developed tumors. None of the control group of 18 
normal mice developed tumors in the same time pe- 
TOG 


Mark Harmel/Getty Images 


(a) Explain why we cannot safely use either the 
large-sample confidence interval or the test for 
comparing the proportions of normal and altered 
mice that develop tumors. 

(b) The plus four method adds two observations, a 
success and a failure, to each sample. What are the 
sample sizes and the numbers of mice with tumors 
after you do this? 

(c) Give a 99% confidence interval for the difference 
in the proportions of the two populations that develop 
tumors. 


Effects of an appetite suppressant, continued. 
Exercise 23.18 describes a study to determine if 
subjects with pre-existing cardiovascular symptoms 
were at an increased risk of cardiovascular events 
while taking sibutramine. Do the data give good 
reason to think that there is a difference between 


the proportions of treatment and placebo subjects 
who experienced the primary outcome? (Note that 
sibutramine has not been available in the United 
States since the end of 2010 due to its manufac- 
turer’s concerns over increased risk of heart attack 
or stroke, although at the present time it can still be 
purchased in other countries.) 


(a) State hypotheses, find the test statistic, and 
use either software or the bottom row of Table C 
for the P-value. Be sure to state your conclusion. 
(b) Explain simply why it was important to have a 
placebo group in this study. 


Adolescence, music, and algebra. Research has suggested that 
musicians process music in the same cortical regions in which ado- 
lescents process algebra. When taking introductory algebra, will 
students who were enrolled in formal instrumental or choral music 
instruction during middle school outperform those who experienced 
neither of these modes of musical instruction? The sample consisted 
of 6026 ninth-grade students in Maryland who had completed in- 
troductory algebra. Of these, 3239 students had received formal 
instrumental or choral instruction during all three years of middle 
school, whereas the remaining students had not. Of those receiving 
formal musical instruction, 2818 received a passing grade on the 
Maryland Algebra/Data Analysis High School Assessment (HSA). 
In contrast, 2091 of the 2787 students not receiving musical in- 
struction received a passing grade.'® Exercises 23.21 to 23.23 are 
based on this study. 


23.21 Does music make a difference? 


(a) Is there a significant difference in the propor- 
tions of students with and without musical instruc- 
tion who receive a passing grade on the Maryland 
HSA? State hypotheses, find the test statistic, and 
use software or the bottom row of Table C to get a 
P-value. 

(b) Is this an observational study or an experiment? 
Why? 

(c) In view of your answer in part (b), carefully state 
your conclusions about the relationship between 
music instruction and success in algebra. 


23.22 How many students pass? Give a 95% confidence 
interval for the proportion of ninth-grade students 


who receive a passing grade on the HSA. 


23.23 How big a difference? Give a 95% confidence in- 
terval for the difference between the proportions 
of students passing the HSA who have received or 
not received formal musical instruction in middle 


school. 


23.24 The design matters. Due to concerns about the 
safety of bariatric weight loss surgery, in 2006 
the Centers for Medicare and Medicaid Services 
(CMS) restricted coverage of bariatric surgery to 


hospitals designated as Centers for Excellence. Did 


23.25 


the CMS restriction improve the outcomes of bar- 
iatric surgery for Medicare patients? Among the 
1847 Medicare patients in the study having sur- 
gery in the 18 months preceding the restriction, 
270 experienced overall complications, whereas 
in the 1639 patients having bariatric surgery in the 
18 months following the restriction, 170 experi- 
enced overall complications.” 


(a) What are the sample proportions of Medicare 
patients who experienced overall complications 
from bariatric surgery before and after the CMS 
restriction on coverage? How strong is the evi- 
dence that the proportions of overall complications 
are different before and after the CMS restriction? 
Use an appropriate hypothesis test to answer this 
question. 

(b) Is this an observational study or an experiment? 
Can we conclude that the CMS restriction has 
reduced the proportion of overall complications? 
(c) Improved outcomes may be due to several 
factors, including the use of lower-risk bariatric 
procedures, increased surgeon experience, or 
healthier patients receiving the surgery. What 
types of variables are these, and how do they 
affect the types of conclusions that you can 
make? 

(d) In a second study,”° a control group consisting 
of non-Medicare patients obtained before and after 
the restrictions on coverage was obtained. When 
compared with this control group, no significant 
evidence was found that the CMS restriction 
reduced the proportion of overall complications 

for the Medicare group. Explain in simple language 
how comparison with a control group could help 
reduce the effects of some of the variables described 
in part (c)? 


Significant does not mean important. Never for- 
get that even small effects can be statistically signifi- 
cant if the samples are large. To illustrate this fact, 
consider a sample of 148 small businesses. During a 
three-year period, 15 of the 106 headed by men and 
7 of the 42 headed by women failed.! 


(a) Find the proportions of failures for businesses 
headed by women and businesses headed by 
men. These sample proportions are quite close to 
each other. Give the P-value for the z test of the 
hypothesis that the same proportion of women’s 
and men’s businesses fail. (Use the two-sided 
alternative.) The test is very far from being 
significant. 

(b) Now suppose that the same sample proportions 
came from a sample 30 times as large. That is, 210 
out of 1260 businesses headed by women and 450 
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out of 3180 businesses headed by men fail. Verify 
that the proportions of failures are exactly the same 
as in part (a). Repeat the z test for the new data, 
and show that it is now significant at the a = 0.05 
level. 

(c) It is wise to use a confidence interval to estimate 
the size of an effect rather than just giving a P-value. 
Give the large sample 95% confidence intervals for 
the difference between the proportions of women’s 
and men’s businesses that fail for the settings of 
both parts (a) and (b). What is the effect of larger 
samples on the confidence interval? 


In responding to Exercises 23.26 to 23.35, follow the Plan, Solve, 
and Conclude steps of the four-step process. 


23.26 


23.27 


23.28 


23.29 


Demographics of social networking sites: sex. 
A Pew Internet survey in 2012 examined several 
demographic variables of users of social networking 
sites including sex, age, race, education, and income. 
Of the 1802 Internet users included in the survey, 
846 were men and 525 of these used social network- 
ing sites. Among the 956 women in the survey, 679 
of the women used social networking sites.” Is there 
good evidence that the proportion of Internet users 
who use social networking sites is different for men 
and women? 


Demographics of social networking sites: race. 
The survey in Exercise 23.26 also looked at pos- 
sible differences in the proportions of white and 
Hispanic Internet users who used social network- 
ing sites. They found that 866 of the 1332 white 
Internet users and 111 of the 154 Hispanics used 
social networking sites. Is there evidence of a dif- 
ference between the proportions of white and His- 
panic Internet users who use social networking 
sites? 


More on social networking and sex. Continue 
your work from Exercise 23.26. Estimate the differ- 
ence between the proportions of male and female 
Internet users who use social networking sites. (Use 


90% confidence.) 


Smoking cessation. Chantix is different from most 
other quit-smoking products in that it targets nico- 
tine receptors in the brain, attaches to them, and 
blocks nicotine from reaching them. As part of a 
larger randomized controlled trial, generally healthy 
smokers who smoked at least 10 cigarettes per day 
were assigned at random to take Chantix or a pla- 
cebo. The study was double-blind, with the response 
measure being continuous absence from smoking for 
weeks 9 through 12 of the study. Of the 352 subjects 
taking Chantix, 155 abstained from smoking during 
weeks 9 through 12, whereas 61 of the 344 subjects 
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taking the placebo abstained during this same time 
period.”? Give a 99% confidence interval for the dif- 
ference (treatment minus placebo) in the propor- 
tions of smokers who abstained from smoking during 
weeks 9 through 12. 


The Gold Coast. A historian examining Brit- 
ish colonial records for the Gold Coast in Africa 
suspects that the death rate was higher among 
African miners than among European miners. 
In the year 1936, there were 223 deaths among 
33,809 African miners and 7 deaths among 1541 
European miners on the Gold Coast.** (The Gold 
Coast became the independent nation of Ghana in 
1957.) Consider this year as a random sample from 
the colonial era in West Africa. Is there good evi- 
dence that the proportion of African miners who 
died was higher than the proportion of European 
miners who died? 


I refuse! Do our emotions influence economic de- 
cisions? One way to examine the issue is to have 
subjects play an “ultimatum game” against other 
people and against a computer. Your partner (per- 
son or computer) gets $10, on the condition that it 
be shared with you. The partner makes you an offer. 
If you refuse, neither of you gets anything. So it’s 
to your advantage to accept even the unfair offer 
of $2 out of the $10. Some people get mad and re- 
fuse unfair offers. Here are data on the responses of 
76 subjects randomly assigned to receive an offer 
of $2 from either a person they were introduced to 
or a computer:” 


ACCEPT REJECT 


Human offers 20 18 
Computer offers ay 6 


We suspect that emotion will lead to offers from an- 
other person being rejected more often than offers 
from an impersonal computer. Do a test to assess the 
evidence for this conjecture. 


Did the random assignment work? A large clinical 
trial of the effect of diet on breast cancer assigned 
women at random to either a normal diet or a low-fat 
diet. To check that the random assignment did pro- 
duce comparable groups, we can compare the two 
groups at the start of the study. Asked if there is a 
family history of breast cancer, 3396 of the 19,541 
women in the low-fat group and 4929 of the 29,294 
women in the control group said “yes.””6 If the ran- 
dom assignment worked well, there should not be a 
significant difference in the proportions with a family 
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23.34 


history of breast cancer. How significant is the ob- 
served difference? 


(Optional topic) Lyme disease. Lyme disease is 
spread in the northeastern United States by in- 
fected ticks. The ticks are infected mainly by feed- 
ing on mice, so more mice result in more infected 
ticks. The mouse population in turn rises and falls 
with the abundance of acorns, their favored food. 
Experimenters studied two similar forest areas in a 
year when the acorn crop failed. They added hun- 
dreds of thousands of acorns to one area to imitate 
an abundant acorn crop, while leaving the other 
area untouched. The next spring, 54 of the 72 mice 
trapped in the first area were in breeding condi- 
tion, versus 10 of the 17 mice trapped in the second 
area.”’ Estimate the difference between the propor- 
tions of mice ready to breed in good acorn years and 
bad acorn years. (Use 90% confidence. Be sure to 
justify your choice of confidence interval.) 


Scott Camazine/Science Source 


Abecedarian early childhood education program: 
adult outcomes. The Abecedarian Project is a ran- 
domized controlled study to assess the effects of in- 
tensive early childhood education on children who 
were at high risk based on several sociodemographic 
indicators.** The project randomly assigned some 
children to a treatment group that was provided with 
early educational activities before kindergarten, and 
the remainder to a control group. A recent follow-up 
study interviewed subjects at age 30 and evaluated 
educational, economic, and socioemotional out- 
comes to learn if the positive effects of the program 
continued into adulthood. The follow-up study in- 
cluded 52 individuals from the treatment group and 
49 from the control group. Out of these, 39 from the 
treatment group and 26 from the control group were 
considered “consistently” employed (working 30+ 
hours per week in at least 18 of the 24 months prior 
to the interview). Does the study provide significant 
evidence that children who had early childhood 


education have a higher proportion of consistent 
employment than those who did not? How large is 
the difference between the proportions in the two 
populations that are consistently employed? Do in- 
ference to answer both questions. Be sure to explain 
exactly what inference you choose to do. 
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hand washing, and a control group, which was 
asked to use routine hand washing but to avoid 
the use of alcohol-based hand sanitizers. Here 
are the data on the numbers of subjects with and 
without RV infection in the two groups over the 
10-week study period:’ 


23.35 Hand sanitizers. Hand disinfection is frequently 

43 recommended for prevention of transmission of RV INFECTION 
the rhinovirus that causes the common cold. In YES ~=sNO 
particular, hand lotion containing 2% citric acid 
and 2% malic acid in 70% ethanol (HL+) has 
been found to have both immediate and persistent 
ability to inactivate rhinovirus (RV) on the hands 
in an experimental setting. Is hand disinfection (a) Is this an experiment or an observational study? 
effective in reducing the risk of infection in a Why? 
natural setting? A total of 212 volunteers were as- 
signed at random to either the HL+ group, which 
used the hand lotion every three hours or after 


\elter 49 67 
Control group 49 47 


(b) Do the data give good evidence that hand 
sanitizers reduce the chance of an RV infection? 


Exploring the Web 


Hearing loss in adolescents. Go to the Journal of the American Medical 
Association website, http: //jama.ama-assn.org/content/by/year, 
and find the article “Change in Prevalence of Hearing Loss in US Adolescents” 
by Shargorodsky et al. in the August 18, 2010, issue. If you cannot get the full 
text of the article, use the information in the abstract plus the information given 
here to answer the questions. NHANES III is the earlier sample, and NHANES 
2005-2006 is the more recent sample. 


(a) Is this an observational study or an experiment? 


(b) How many people were in the earlier sample, and how many were in the 
later sample? 


(c) If you do not have access to Table 2 of the full article, here are the facts 
that you will need: in the earlier study, 480 people experienced some hearing 
loss, whereas in the later study, 333 people experienced some hearing loss. Is 
there evidence of an increase in hearing loss for children aged 12 to 19 in the 
later study? State hypotheses, find the test statistic, and use either software or 
Table A to compute the P-value. Although the article used a more sophisticated 
analysis, your P-value should be quite close to the P-value of 0.02 reported in 
the abstract. 


23.37 Compare two surveys. Go to the website www.pollingreport .com, 
which contains the results of surveys conducted by several survey organizations. 
Choose a topic of interest to you, and then, to see if attitudes have changed 
over time, find two surveys that were conducted at two different times but that 
ask the same question. For example, you might choose the topic of abortion 
and compare the percents of people who feel abortion should always be legal 
at points in time separated by several years. State hypotheses to check for a 
difference over time, find the test statistic, and use either software or Table A to 
compute the P-value. What is your conclusion in context? 
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Inference about 
Variables: Part IV Review 


he procedures of Chapters 20 to 23 are among the most common of all sta- 
tistical inference methods. Now that you have mastered important ideas and 
practical methods for inference, it’s time to review the big ideas of statistics 
in outline form. Here is a summary of Parts I, II, and II of this book, leading up to 
Part IV. The outline contains some important warnings: look for the Caution icon. 


1. Data Production 


Data basics: 


Individuals (subjects). 
Variables: categorical versus quantitative, units of measurement, ex- 
planatory versus response. 
Purpose of study. 
Data production basics: 


Observation versus experiment. 

Simple random samples. 

Completely randomized experiments. 
Beware: bad data production (voluntary response, 
confounding) can make interpretation impossible. 


GOAT ER 24 


In this chapter 
we cover... 


Part IV Summary 
Test Yourself 


Supplementary Exercises 
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only one campus) can make generalizing conclusions difficult. 


A Beware: weaknesses in data production (for example, sampling students at 


2. Data Analysis 


Plot your data. Look for an overall pattern and striking deviations. 
Add numerical descriptions based on what you see. 
A Beware: averages and other simple descriptions can miss the real story. 
One quantitative variable: 
Graphs: stemplot, histogram, boxplot. 
Pattern: distribution shape, center, variability. Outliers? 
Density curves (such as Normal curves) to describe overall 
pattern. 
Numerical descriptions: five-number summary or X and s. 
Relationships between two quantitative variables: 
Graph: scatterplot. 
Pattern: relationship form, direction, strength. Outliers? Influential 
observations? 
Numerical description for linear relationships: correlation, regression 


line. 
A Beware the lurking variable: correlation does not imply causation. 


Beware the effects of outliers and influential observations. 


3. The Reasoning of Inference 


Inference uses data to infer conclusions about a wider population. 


When you do inference, you are acting as if your data come from random 
samples or randomized comparative experiments. Beware: if they don’t, 
you may have “garbage in, garbage out.” 


Always examine your data before doing inference. Inference often requires 
a regular pattern, such as roughly Normal with no strong outliers. 


Key idea: “What would happen if we did this many times?” 
Confidence intervals: estimate a population parameter. 


95% confidence: I used a method that captures the true parameter 95% 
of the time in repeated use. 

Beware: the margin of error of a confidence interval does not include 

A the effects of practical errors such as undercoverage and nonre- 
sponse. 
Significance tests: assess evidence against Hp in favor of H,,. 

P-value: If Hy were true, how often would I get an outcome favor- 
ing the alternative this strongly? Smaller P = stronger evidence 
against Ho. 

Statistical significance at the 5% level, P < 0.05, means that an 
outcome this extreme would occur less than 5% of the time if Ho 


were true. 
A Beware: P < 0.05 is not sacred. 
Beware: statistical significance is not the same as practical significance. 
Large samples can make small effects significant. Small samples can 
fail to declare large effects significant. 


Always try to estimate the size of an effect (for example, with a 
confidence interval), not just its significance. 
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4. Methods of Inference 


Choose the right inference procedure. 
Check the conditions for inference. 
Carry out the details. 


State your conclusion in context. 


Part IV of this book introduces the fourth and last part of this outline. To actu- 
ally do inference, you must choose the right procedure, check the conditions for 
inference, and carry out the details. The Statistics in Summary flowchart on the 
next page offers a brief guide. It is important to do some of the review exercises 
because now, for the first time, you must decide which of several inference proce- 
dures to use. Learning to recognize problem settings in order to choose the right 
type of inference is a key step in advancing your mastery of statistics. This is the 
Plan step in the four-step process, in which you translate the real-world problem 
from the State step into a specific inference procedure. 

The flowchart organizes one way of planning inference problems. Let’s go 
through it from left to right. 


1. Do you want to test a claim or estimate an unknown quantity? That is, will you 
need a test of significance or a confidence interval? 


2. Are your data a single sample representing one population or two samples chosen 
to compare two populations or responses to two treatments in an experiment? Re- 
member that to work with matched pairs data you form one sample from the 
differences within pairs. 


3. Is the response variable quantitative or categorical? Quantitative variables take 
numerical values with some unit of measurement such as inches or grams. The 
most common inference questions about quantitative variables concern mean 
responses. If the response variable is categorical, inference most often con- 
cerns the proportion of some category (call it a “success”) among the responses. 


The flowchart leads you to a specific test or confidence interval, indicated by a 
formula at the end of each path. The formula is just an aid to guide you toward the 
Solve and Conclude steps. You (or your technology) will use the formula as part of 
the Solve step, but don’t forget that you must do more. 


Are the conditions for this procedure met? Can you act as if the data come 
from a random sample or randomized comparative experiment? Does 
data analysis show extreme outliers or strong skewness that forbids use of 
inference based on Normality? Do you have enough observations for your 
intended procedure? 


Do your data come from an experiment or from an observational study? The 
details of inference methods are the same for both. But the design of the study 
determines what conclusions you can reach, because experiments give much 
better evidence that an effect uncovered by inference can be explained by 
direct causation. 


You may ask, as you study the Statistics in Summary flowchart, “What if I have 
an experiment comparing four treatments, or samples from three populations?” The 
flowchart allows only one or two, not three or four or more. Be patient: methods for 
comparing more than two means or proportions, as well as some other settings for 
inference, appear in Part V. 


in} HOW MANY 
i WAS THAT? 


Good causes often breed bad 
statistics. An advocacy group 
claims, without much evidence, that 
150,000 Americans suffer from the 
eating disorder anorexia nervosa. 
Soon someone misunderstands and 
says that 150,000 people die from 
anorexia nervosa each year. This wild 
number gets repeated in countless 
books and articles. It really is a wild 
number: only about 55,000 women 
aged 15 to 44 (the main group 
affected) die of all causes each year. 
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Here are the most important skills you should have acquired from reading 
Chapters 20 to 23. 


A. Recognition 

1. Recognize when a problem requires inference about population means 
(quantitative response variable) or population proportions (usually cat- 
egorical response variable). 

2. Recognize from the design of a study whether one-sample, matched pairs, 
or two-sample procedures are needed. 

3. Based on recognizing the problem setting, choose among the one- and two- 
sample t procedures for means and the one- and two-sample x procedures 
for proportions. 


B. Inference about One Mean 


sf 


6. 


Verify that the t procedures are appropriate in a particular setting. Check 
the study design and the distribution of the data and take advantage of 
robustness against lack of Normality. 

Recognize when poor study design, outliers, or a small sample from a 
skewed distribution make the t procedures risky. 

Use the one-sample t procedure to obtain a confidence interval at a stated 
level of confidence for the mean pw of a population. 

Carry out a one-sample t test for the hypothesis that a population mean 
has a specified value against either a one-sided or a two-sided alternative. 
Use software to find the P-value or Table C to get an approximate value. 
Recognize matched pairs data and use the t procedures to obtain confi- 
dence intervals and to perform tests of significance for such data. 


Know that resampling can be used to estimate standard errors. 


Comparing Two Means 


iL. 


Verify that the two-sample t procedures are appropriate in a particular 
setting. Check the study design and the distribution of the data and take 
advantage of robustness against lack of Normality. 


Give a confidence interval for the difference between two means. Use soft- 
ware if you have it. Use the two-sample t statistic with conservative degrees 
of freedom and Table C if you do not have statistical software. 


Test the hypothesis that two populations have equal means against either 
a one-sided or a two-sided alternative. Use software if you have it. Use the 
two-sample t test with conservative degrees of freedom and Table C if you 
do not have statistical software. 

Know that procedures for comparing the standard deviations of two 
Normal populations are available, but that these procedures are risky 
because they are not at all robust against non-Normal distributions. 
Know that permutation tests can be used to test the hypothesis of no 
treatment effect in experiments in which units have been assigned to treat- 
ments completely at random. 


Inference about One Proportion 


iL. 


Verify that you can safely use either the large-sample or the plus four z pro- 
cedures in a particular setting. Check the study design and the guidelines 
for sample size. 

Use the large-sample x procedure to give a confidence interval for a popu- 
lation proportion p when the conditions are met. Otherwise, the plus four 
modification of the z procedure can give a confidence interval for p that is 
accurate even for small samples and for any value of p. 

Use the x statistic to carry out a test of significance for the hypothesis 
Ho: p = po about a population proportion p against either a one-sided or 
a two-sided alternative. Use software or Table A to find the P-value, or 
Table C to get an approximate value. 


Comparing Two Proportions 


ik. 


Verify that you can safely use either the large-sample or the plus four z pro- 
cedures in a particular setting. Check the study design and the guidelines 
for sample sizes. 
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2. Use the large-sample z procedure to give a confidence interval for the 
difference p,; — p, between proportions in two populations based on 
independent samples from the populations when the conditions are met. 
Otherwise, the plus four modification of the zx procedure can give a confi- 
dence interval for p; — p, that is accurate even for very small samples and 
for any values of p; and p. 

3. Use a z statistic to test the hypothesis Ho: p; = p; that proportions in two 
distinct populations are equal. Use software or Table A to find the P-value, 
or Table C to get an approximate value. 


The questions below include both multiple-choice, and short-answer questions and calculations. 
They will help you review the basic ideas and skills presented in Chapters 20 to 23. 


Calcium and blood pressure. In a randomized comparative experiment on the effect of 
dietary calcium on blood pressure, researchers divided 54 healthy white males at random into two 
groups. One group received calcium; the other, a placebo. At the beginning of the study, the re- 
searchers measured many variables on the subjects. The paper reporting the study gives x = 114.9 
and s = 9.3 for the seated systolic blood pressure of the 27 members of the placebo group. Use this 
information to answer Questions 24.1 and 24.2. 


24.1 A 95% confidence interval for the mean blood pressure in the population from 
which the subjects were recruited is 


(a) 113.1 to 116.7. 
(b) 111.8 to 118.0. 
(c) 111.2 to 118.6. 
(d) 109.9 to 119.9. 


24.2 What conditions for the population and the study design are required by the pro- 
cedure you used to construct your confidence interval? Which of these conditions 
are important for the validity of the procedure in this case? 


Does nature heal better? Our bodies have a natural electrical field that is known to 
help wounds heal. Does changing the field strength slow healing? A series of experiments with 
newts investigated this question. The data below are the healing rates of cuts (micrometers 
per hour) in a matched pairs experiment. The pairs are the two hind limbs of the same newt, 
with the body’s natural field in one limb (control) and half the natural value in the other limb 


(experimental).' 
Newt 1 2} Ses oe Le 
Control 25 13 44 45 57 42 50 36 35 38 43 31 26 48 
Experimental 24 23 47 42 26 46 38 33 28 28 21 27 25 45 
Ditferencer(Controll=3 | 10 359 31 452 3 10224 3 
Experimental) 


The mean and standard deviation of the differences ave 5.71 and 10.56 micrometers per hour, 
respectively. Use this information to answer Questions 24.3 and 24.4. 


24.3 Is there good evidence that changing the electrical field from its natural level 
slows healing? The P-value for your test is 


(a) less than 0.01. 
b) between 0.01 and 0.05. 
c) between 0.05 and 0.10. 


( 
( 
(d) greater than 0.10. 


24.4 — Explain why this is a matched pairs experiment. Give a 99% confidence interval 
for the difference in healing rates (control minus experimental). 


Computer use. A survey asked 3494 randomly selected high school seniors about their com- 
puter use.” Among the 1,734 females in the sample, 614 used computers three or more hours per 
day (playing video or computer games or using the computer for something that was not school 
work). Among the 1,760 men sampled, 676 used computers three or more hours per day. Use this 
information to answer Questions 24.5 to 24.8. 


24.5 The sample proportion of males who used computers for three or more hours a 
day is 
(a) 0.193. 
(b) 0.354. 
(c) 0.369. 
(d) 0.384. 


24.6 The pooled sample proportion of high school seniors who used computers for 
three or more hours a day is 


(a) 0.193. 
(b) 0.354. 
(c) 0.369, 
(d) 0.384. 


24.7 The sampling distribution for the difference in the proportions of male and female 
high school seniors who used computers for three or more hours a day has stan- 
dard error 


(a) 0.0043. 
(b) 0.0053. 
(c) 0.0081. 
(d) 0.0163. 


24.8 The z test for comparing the proportions of male and female high school seniors 
who used computers for three or more hours a day has 


(EN I? =< HONE, (od) OO << IP < O05, (c) 0.05 < P < 0.10. 
(d) P > 0.10. 


24.9 Wikipedia. A sample survey of 852 adult Internet users found 
that 53% look for information on the online collaborative en- 
cyclopedia Wikipedia.’ 

(a) Give the standard error SE of f, the proportion of 
all adult Internet users who look for information on 
Wikipedia. 

(b) Give a 95% confidence interval for the proportion 
of all adult Internet users who look for information on 
Wikipedia. 


Evil genius? Does cheating enhance creativity? Researchers recruited 178 
subjects and asked participants to guess whether the outcome of a virtual coin 
toss would be heads or tails. After indicating their prediction, participants had to press a button 
to toss the coin virtually. They were asked to press the button only once, but they were given the 
Opportunity to test the button several times before beginning the experiment. Thus, participants 
could cheat by pressing the button before making their prediction and appear to have pressed the 
button one time for each prediction. Participants then reported whether they had guessed correctly 
and received a $1 bonus if they had. The program recorded the outcomes of the initial virtual 
coin tosses so that researchers could tell whether participants cheated. After the coin-toss task, all 
participants were given a test that measured creativity. Here are the summary statistics for scores 
on the creativity test:* 
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Group n x s 
Cheaters 43 3.60 126 
Noncheaters 135 233 1.00 


Use this information to answer Questions 24.10 to 24.13. 


24.10 


24.11 


24.12 


24.13 


24.14 


A 95% confidence interval for the mean score on the creativity test for those 
subjects who cheated is 


(a) 3.60 + 0.19. 
(b) 3.60 + 0.39. 
(c) 3.60 + 2.02. 
(d) 1.27 + 0.43. 


A 95% confidence interval for the mean score on the creativity test for those 
subjects who did not cheat is 


(ai233 270109! 


(3) ABs se Ok 
(@) 73s) Se IAs. 
(Gl) Lay se Aish 


Is there a significant difference between the mean scores on the creativity test for 
those who cheated and those who did not? The value of the t statistic for testing 
the null hypothesis of no difference in the mean scores on the creativity test is 


(a) 0.21. 
(b) 1.27. 

(c) 4.76. 

(d) 6.03. 

Is there a significant difference between the mean scores on the creativity test 
for those who cheated and those who did not? The degrees of freedom using 
the conservative Option 2 for the t statistic for testing the null hypothesis of no 
difference in the mean scores on the creativity test is 


(a) 42. 

(b) 59.8. 
(c) 134. 
(d) 176. 


Butterflies mating. Here’s how butterflies mate: a male passes to a female a 
packet of sperm called a spermatophore. Females may mate several times. Will 
they remate sooner if the first spermatophore they receive is small? Among 
20 females who received a large spermatophore (greater than 25 milligrams), the 
mean time to the next mating was 5.15 days, with standard deviation 0.18 day. 
For 21 females who received a small spermatophore (about 7 milligrams), the 
mean was 4.33 days and the standard deviation was 0.31 day.’ Is the observed 
difference in means statistically significant? Test using the conservative Option 2 
for the degrees of freedom. The P-value is 


(a) less than 0.01. 
b) between 0.01 and 0.05. 
c) between 0.05 and 0.10. 


( 
( 
(d) greater than 0.10. 


Treating migraines. A randomized clinical trial of 135 youth aged 10 to 17 years diagnosed 
with chronic migraine were assigned to one of two treatments. One involved 10 cognitive behav- 
ioral therapy (CBT) sessions and use of the drug amitriptyline. The other involved 10 headache 
education sessions plus amitriptyline. Twenty weeks after treatment, the severity of migraines for 
each subject was assessed using the Pediatric Migraine Disability Assessment Score (PeAMIDAS). 
Here are data on PedMIDAS scores 20 weeks post-treatment for the two groups:° 


Group n Mean Standard deviation 
CBT 64 1585 17.4 
Education all 29.6 42.2 


Use this information to answer Questions 24.15 to 24.18. 


24.15 


24.16 


24.17 


24.18 


24.19 


24.20 


Both sets of endurance data are skewed to the right. How do we know this (note 
that PedMIDAS scores are greater than or equal to zero). Why are t procedures 
nonetheless reasonably accurate for these data? 


A 90% confidence interval for the mean PedMIDAS score for the CBT group is 


(a) 15.05 to 15.95. (c) 11.87 to 19.13. 
(b) 13.32 to 17.68. (d) 11.15 to 19.85. 


A 90% confidence interval for the mean difference (education minus CBT) in 
PedMIDAS scores, using the conservative Option 2 for the degrees of freedom, is 


(Ed) Shilish ite) SOV, (c) 8.64 to 19.56. 
(b) 4.98 to 23.22. (d) 21.24 to 37.96. 


Do the data show that there is a difference in mean PedMIDAS scores for the two 
treatments? Carry out an appropriate hypothesis test using Option 1 if you have 
technology that implements this method. Otherwise, use Option 2. 


Pre-readers in kindergarten. A school has two kindergarten classes. There 
are 21 children in Ms. Hazelcorn’s kindergarten class. Of these, 12 are “pre- 
readers”—children on the verge of reading. There are 19 children in Mr. Shapiro’s 
kindergarten class. Of these, 14 are pre-readers. Is there a difference in the pro- 
portions of “pre-readers” in the two classes? The z statistic is computed and the 
P-value is 0.273. 


(a) This test is not reliable because the samples are so small. 

(b) This test is of no use because we should be comparing means with a t statistic. 
(c) This test is reasonable because the counts of successes and failures are each 
five or more in both samples. 

(d) This test is not appropriate because these samples cannot be viewed as 
simple random samples taken from a larger population. 


State of the economy. If we want to estimate p, the population proportion of 
likely voters who believe the state of the economy to be the most urgent national 
concern, with 90% confidence and a margin of error no greater than 2%, how 
many likely voters need to be surveyed? Assume that you have no idea of the 
value of p. 


(a) 1024 (c) 4096 
(b) 1692 (d) 6765 


Favoritism for college athletes? Sports Illustrated surveyed a random sample of 757 
Division I college athletes in 36 sports. One question asked was, “Have you ever received preferen- 
tial treatment from a professor because of your status as an athlete?” Of the athletes polled, 225 
said “yes.” Use this information to answer Questions 24.21 to 24.23. 


24.21 


24.22 


The sample proportion of athletes who have received preferential treatment from 
a professor is 


(a) 0.160. (c) 0.703. 

(b) 0.297. (d) 0.840. 

The standard error SE of f, the proportion of athletes who have received prefer- 
ential treatment from a professor, is 


(a) 0.003. (c) 0.209. 
(b) 0.017. (d) 0.457. 
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24.23 


A 90% confidence interval for the proportion of athletes who have received pref- 
erential treatment from a professor is 


(a) 0.276 to 0.320. (c) 0.265 to 0.331. 
(b) 0.270 to 0.326. (d) 0.255 to 0.342. 


Very-low-birth-weight babies. Starting in the 1970s, medical technology allowed babies 
with very low birth weight (VLBW, less than 1500 grams, about 3.3 pounds) to survive without 
major handicaps. It was noticed that these children nonetheless had difficulties in school and as 
adults. A long-term study has followed 242 VLBW babies to age 20 years, along with a control 
group of 233 babies from the same population who had normal birth weight.’ At age 20, 179 of the 
VLBW group and 193 of the control group had graduated from high school. Use this information 
to answer Questions 24.24 to 24.29. 


24.24 


24.25 


24.26 


24.27 


24.28 


24.29 


This is an example of 


(a) an observational study. 

(b) a nonrandomized experiment. 
(c) a randomized controlled study. 
(d) a matched pairs experiment. 


Take pyipw and Peon to be the proportions of all VLBW and normal-birth- 
weight (control) babies who would graduate from high school. The hypotheses to 
be tested are 


(a) Ho: py aw = Peonerr Versus H,: py raw A Deontrols 

(b) Ho: pv aw = Peon Versus H,: py aw > Peonerol+ 

(c) Ho: py aw = Peonerot Versus Hy: py raw < Peontrols 

(d) Ho: pv raw > Peonerwi Versus H,: Py raw = Deonerote 

The pooled sample proportion of babies who would graduate from high school 
is 

(a) p = 0.74. (c) p = 0.81. 

(b) f = 0.78. (d) p = 0.83. 


The numerical value of the z test for comparing the proportions of all VLBW 
and normal-birth-weight (control) babies who would graduate from high 
school is 


(a) z = —1.65. (c) z = —2.77. 
(b) x = —2.34. (d) z = —3.14. 


IQ scores were available for 113 men in the VLBW group and for 106 men in the 
control group. The mean IQ for the 113 men in the VLBW group was 87.6, and 
the standard deviation was 15.1. The 106 men in the control group had mean 
IQ 94.7, with standard deviation 14.9. Is there good evidence that mean IQ is 
lower among VLBW men than among controls from similar backgrounds? To test 
this with a two-sample t test, the test statistic would be 

(a) t = —1.72. (c) t = —5.00. 

(b) t = —3.50. (d) t = —7.10. 

Of the 126 women in the VLBW group, 38 said they had used illegal drugs; 54 of 
the 124 control group women had done so. The IQ scores for the VLBW women 
who had used illegal drugs had mean 86.2 (standard deviation 13.4), and the 
normal-birth-weight controls who had used illegal drugs had mean IQ 89.8 (stan- 
dard deviation 14.0). Is there a statistically significant difference between the two 
groups in mean IQ? The P-value for this test is 


(a) less than 0.01. 
b) between 0.01 and 0.05. 
c) between 0.05 and 0.10. 


( 
( 
(d) greater than 0.10. 


Binge drinking. According to the National Institute on Alcohol Abuse and Alcoholism 
(NIAAA) and the National Institutes of Health (NIH), 41% of college students nationwide en- 
gage in “binge-drinking” behavior: having five or more drinks on one occasion during the past two 
weeks. A college president wonders if the proportion of students enrolled at her college who binge 
drink is actually lower than the national proportion. In a commissioned study, 348 students are 
selected randomly from a list of all students enrolled at the college. Of these, 132 admit to having 
engaged in binge drinking. Use this information to answer Questions 24.30 to 24.32. 


24.30 Based on the results of the commissioned study, a 95% confidence interval for the 
proportion of all students at this college who engage in binge drinking is 


(a) 0.328 to 0.430. (c) 0.341 to 0.420. 
(b) 0.338 to 0.423. (d) 0.343 to 0.418. 


24.31 The college president is more interested in testing her belief that the propor- 
tion of students at her college who engage in binge drinking is lower than the 
national proportion of 0.41. Her staff tests the hypotheses Hy: p = 0.41 versus 
Hyp < 0.41. The P-value is 


(a) between 0.15 and 0.20. 
(b) between 0.10 and 0.15. 
(c) between 0.05 and 0.10. 
(d) below 0.05. 


24.32 Which of the following conclusions is reasonable, based on the P-value computed 
in Exercise 24.31? 


(a) There is little evidence to support a conclusion that the proportion of 
students at this particular college who binge drink is lower than the national 
proportion of 0.41. 

(b) There is moderate but not strong evidence that the proportion of binge- 
drinking students at this college is lower than the national proportion of 0.41. 
(c) There is strong evidence that the proportion of students at this college who 
binge drink is lower than the national proportion of 0.41. 

(d) We can’t reach any reasonable conclusion, because the assumptions 
necessary for a significance test for a proportion are not met in this case. 


Who texts? Younger people use their cell phones to text more often than older people do. A 
random sample of 625 teens aged 12 to 17 who use their cell phones to text found that 475 sent 
more than 10 text messages in a typical day. In a random sample of 1917 adults aged 18 and over 
who use their cell phones to text, 786 sent more than 10 text messages in a typical day.’ Use this 
information to answer Questions 24.33 and 24.34. 


24.33 Give a 90% confidence interval for the proportion of all teens aged 12 to 17 
who use their cell phones to text and send more than 10 text messages in a 
single day. 


24.34 Give a 95% confidence interval for the difference between the proportions of cell 
phone users who use their cell phones to text and send more than 10 texts a day 
for these two age groups. 


24.35 Smokers in Alaska. In the past decade there have been intensive antismoking 
campaigns sponsored by both federal and private agencies. The Behavioral Risk 
Factor Surveillance System (BRFSS) is an ongoing data collection program that 
monitors behaviors such as smoking on a statewide level using data collected on a 
random sample of adults through a telephone survey.’ The first sample, taken in 2002 
in Alaska, involved 2045 adults, of which 754 were current smokers. The second sam- 
ple, taken in 2012 in Alaska, involved 2690 adults, of which 842 were smokers. The 
samples are to be compared to determine whether the proportion of U.S. adults in 
Alaska who smoke declined during the 10-year period between the samples. Taking 
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P2002 and p12 to be the proportions of all adults in Alaska over 18 who were current 
smokers in these years, state hypotheses, find the test statistic, and use either software 
or the bottom row of Table C for the P-value. Be sure to state your conclusion. 


Diet and bone density in cats. Some dietitians have suggested that highly acidic diets can 
have an adverse effect on bone density in humans. Alkaline diets have been marketed to avoid or 
counteract this effect. Is the same thing true for cats, and would an alkaline diet be beneficial? 
Two groups of four cats were fed diets for 12 months that differed only in acidifying or alkalinizing 
properties. The bone mineral density (g/cm’) of each cat was measured at the end of 12 months. 
The summary statistics for bone mineral density appear below."° 


Diet n x s 
Acidifying 4 0.63 0.01 
Alkalinizing 4 0.64 0.05 


Questions 24.36 to 24.39 are based on this study. 


24.36 A 95% confidence interval for the mean bone mineral density of cats after 12 
months on an acidifying diet is 


(a) 0.622 to 0.638. (c) 0.614 to 0.646. 
(b) 0.616 to 0.644. (d) 0.620 to 0.650. 


24.37 Is there strong evidence that cats on an alkalinizing diet have higher mean bone 
mineral density after 12 months than cats on an acidifying diet? To test this with 
a two-sample t test, the values of the t statistic and its degrees of freedom using 
conservative Option 2 are 


(a) t = 0.39, df = 3. (c) t = 0.01, df = 6. 
(b) t = 0.39, df = 4. (d) t = 0.01, df = 7. 


24.38 A 95% confidence interval for the difference in mean bone mineral density after 
12 months on an alkalinizing diet and an acidifying diet is (use conservative 
Option 2 for the degrees of freedom) 


(a) 0.01 to 0.05. (c) —0.05 to 0.07. 
(b) —0.02 to 0.04. (d) —0.07 to 0.09. 


24.39 What conditions must be satisfied to justify the procedures you used in Exercise 24.36? 
In Exercise 24.37? In Exercise 24.38? 


Choosing an inference procedure. In each of Questions 24.40 to 24.45, say which type 
of inference procedure from the Statistics in Summary flowchart (page 562) you would use, or 
explain why none of these procedures fits the problem. You do not need to carry out any procedures. 


24.40 Driving too fast. How seriously do people view speeding in comparison with 
other annoying behaviors? A large random sample of adults was asked to rate a 
number of behaviors ona scale of 1 (no problem at all) to 5 (very severe problem). 
Do speeding drivers get a higher average rating than noisy neighbors? 


24.41 Preventing drowning. Drowning in bathtubs is a major cause of death in chil- 
dren less than five years old. A random sample of parents was asked many ques- 
tions related to bathtub safety. Overall, 85% of the sample said they used baby 
bathtubs for infants. Estimate the percent of all parents of young children who use 


baby bathtubs. 


24.42 Acid rain? You have data on rainwater collected at 16 locations in the Adiron- 
dack Mountains of New York State. One measurement is the acidity of the water, 
measured by pH on a scale of 0 to 14 (the pH of distilled water is 7.0). Estimate 
the average acidity of rainwater in the Adirondacks. 
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24.43 Athletes’ salaries. Looking online, you find the base salaries of the 25 active 
players on the roster of the Chicago Cubs as of opening day of the 2014 baseball 
season. The total salary of these 25 players was $63.8 million, one of the lowest in 
the major leagues. Estimate the average salary of the 25 active players on the roster. 


24.44 Looking back on love. How do young adults look back on adolescent romance? 
Investigators interviewed 40 couples in their mid-twenties. The female and male 
partners were interviewed separately. Each was asked about his or her current 
relationship and also about a romantic relationship that lasted at least two months 
when they were aged 15 or 16. One response variable was a measure on a numeri- 
cal scale of how much the attractiveness of the adolescent partner mattered. You 


want to compare the men and women on this measure. 


24.45 Preventing AIDS through education. The Multisite HIV Prevention Trial 
was a randomized comparative experiment to compare the effects of twice-weekly 
small-group AIDS discussion sessions (the treatment) with a single one-hour ses- 
sion (the control). Compare the effects of treatment and control on each of the 


following response variables: 


(a) A subject does or does not use condoms six months after the education 
sessions. 

(b) The number of unprotected intercourse acts by a subject between four 

and eight months after the sessions. 

(c) A subject is or is not infected with a sexually transmitted disease six months 
after the sessions. 


Supplementary exercises apply the skills you have learned in 
ways that require more thought or more use of technology. Some 
of these exercises start from actual data rather than from data 
summaries. Many of these exercises ask you to follow the Plan, 
Solve, and Conclude steps of the four-step process. Remember 
that the Solve step includes checking the conditions for the infer- 
ence you plan. 


TIME x s 
63:2 Sei 
IGS) 17.4 


Before treatment 


After treatment 


(a) Which t procedures are correct for comparing 
the mean PedMIDAS scores before and after treat- 
ment: one-sample, matched pairs, or two-sample? 
(b) The data summary given is not enough information 
to carry out the correct t procedures. Explain why not. 


24.46 Do you have confidence? A report of a survey 
distributed to randomly selected email addresses 
at a large university says: “We have collected 427 
responses from our sample of 2,100 as of April 30, 
2004. This number of responses is large enough 
to achieve a 95% confidence interval with +5% 
margin of sampling error in generalizing the results 
to our study population.”'! Why would you be 
reluctant to trust a confidence interval based on 
these data? 


24.48 Monkeys and music. Humans generally prefer 
music to silence. What about monkeys? Allow a 
tamarin monkey to enter a V-shaped cage with food 
in both arms of the V. After the monkey eats the 
food, which arm will it prefer? The monkey’s loca- 
tion determines what it hears, a lullaby played by a 
flute in one arm and silence in the other. Each of 


four monkeys was tested six times, on different days 


24.47 


Treating migraines. Exercises 24.15 to 24.18 
involve a randomized clinical trial to investigate 
the effects of two treatments on migraines. One of 
the treatments, cognitive behavioral therapy (CBT) 
along with the drug amitriptyline, was expected to 
be particularly effective. For each of the 64 subjects 
on this treatment, severity of migraines was assessed 
using PedMIDAS scores. Here are summary data for 
these subjects: 


and with the music arm alternating between left and 
right (in case a monkey prefers one direction). The 
monkeys chose silence for about 65% of their time 
in the cage. The researchers reported a one-sample t 
test for the mean percent of time spent in the music 
arm, Ho: w = 50%, against the two-sided alterna- 
tiVeNta— Ont 925 a OLO0ONRZ 

Although the result is interesting, the statisti- 
cal analysis is not correct. The degrees of freedom 
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df = 23 show that the researchers assumed that they 
had 24 independent observations. Explain why the 
results of the 24 trials are not independent. 
(Optional topic) Drug-detecting rats? Dogs are 
big and expensive. Rats are small and cheap. Might 
rats be trained to replace dogs in sniffing out illegal 
drugs? A first study of this idea trained rats to rear 
up on their hind legs when they smelled simulated 
cocaine. To see how well rats performed after train- 
ing, they were let loose on a surface with many cups 
sunk in it, one of which contained simulated cocaine. 
Four out of six trained rats succeeded in 80 out of 80 
trials.’ How should we estimate the long-term success 
rate p of a rat that succeeds in every one of 80 trials? 


(a) What is the rat’s sample proportion f? What is 
the large-sample 95% confidence interval for p? It’s 
not plausible that the rat will always be successful, 
as this interval says. 

(b) Find the plus four estimate p and the plus four 
95% confidence interval for p. These results are 
more reasonable. 


(Optional topic) A new vaccine. In 2006, the 
pharmaceutical company Merck released a vaccine 
named Gardasil for human papilloma virus, the most 
common cause of cervical cancer in young women. 
The Merck website gives results from “four placebo- 
controlled, double-blind, randomized clinical stud- 
ies” with women 16 to 26 years of age, as follows:'# 


CERVICAL GENITAL 

n CANCER n WARTS 
Gardasil 8487 0 7897 1 
Placebo 8460 32 7899 91 


(a) Give a 99% confidence interval for the difference 
in the proportions of young women who develop 
cervical cancer with and without the vaccine. 

(b) Do the same for the proportions who develop 
genital warts. 

(c) What do you conclude about the overall effec- 
tiveness of the vaccine? 


Starting to talk. At what age do infants speak their 
first word of English? Here are data on 20 children 
(ages in months): | 1STWORD 


5 ss I IS OI aU 8 20 
7 iO iL TW) ale 


(In fact, the sample contained one more child, who 
began to speak at 42 months. Child development 
experts consider this abnormally late, so the investiga- 
tors dropped the outlier to get a sample of “normal” 
children. We are willing to treat these data as an SRS.) 
Is there good evidence that the mean age at first word 
among all normal children is greater than one year? 
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Fertilizing a 
tropical _ plant. 
Bromeliads are 
tropical —_flower- 
ing plants. Many 
are epiphytes that 
attach to trees 
and obtain mois- 
ture and nutrients 
from air and rain. 
Their leaf bases 
form cups that col- 
lect water and are 
home to the lar- 
vae of many insects. In an experiment in Costa Rica, 
Jacqueline Ngai and Diane Srivastava studied whether 
added nitrogen increases the productivity of bromeliad 
plants. Bromeliads were randomly assigned to nitrogen 
or control groups. Here are data on the number of new 
leaves produced over a seven-month period:!® 


G Newport/Science Source 
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Is there evidence that adding nitrogen increases the 
mean number of new leaves formed? 


Starting to talk, continued. Use the data in 
Exercise 24.51 to give a 90% confidence interval 
for the mean age at which children speak their first 


ently when dyed. This matters to clothing manufac- 
turers, who want the color of the fabric to be just 
right. A researcher dyed fabrics made of cotton and 
of ramie with the same “procion blue” dye applied in 
the same way. Then she used a colorimeter to mea- 
sure the lightness of the color on a scale in which 
black is 0 and white is 100. Here are the data for 
eight pieces of each fabric:'’ Sy) recpye 


Cotton 


48.82 48.88 48.98 49.04 48.68 49.34 48.75 49.12 


Ramie 


alles -Gelltes) APO) Salles Tell oa Spllsle ellse) 


24.55 


Is there a significant difference between the fabrics? 
Which fabric is darker when dyed in this way? 


More on dyeing fabrics. The color of a fabric 
depends on the dye used and also on how the dye 
is applied. This matters to clothing manufacturers, 
who want the color of the fabric to be just right. The 
study discussed in Exercise 24.54 went on to dye fab- 
ric made of ramie with the same procion blue dye 
applied in two different ways. Here are the lightness 
scores for eight pieces of identical fabric dyed in each 


way: E§jj,| FBCDYE2 


Method B 


40.98 40.88 41.30 41.28 41.66 41.50 41.39 41.27 


Method C 


42.30 42.20 42.65 42.43 42.50 42.28 43.13 42.45 


24.56 


24.57 


24.58 


(a) This is a randomized comparative experiment. 
Outline the design. 

(b) A clothing manufacturer wants to know which 
method gives the darker color (lower lightness 
score). Use sample means to answer this question. 
Is the difference between the two sample means 
statistically significant? Can you tell from just the 
P-value whether the difference is large enough to be 
important in practice? 


Do parents matter? A professor asked her sopho- 
more students, “Does either of your parents allow 
you to drink alcohol around him or her?” and “How 
many drinks do you typically have per session? (A 
drink is defined as one 12 oz beer, one 4 oz glass of 
wine, or one | oz shot of liquor.)” Table 24.1 con- 
tains the responses of the female students who are 
not abstainers.'’® The sample is all students in one 
large sophomore-level class. The class is popular, so 
we are tentatively willing to regard its members as 
an SRS of sophomore students at this college. Does 
the behavior of parents make a significant difference 
in how many drinks students have on the average? 
Elj;| FMDRINK 


Parents’ behavior. We wonder what proportion of 
female students have at least one parent who allows 
her to drink in the parent’s presence. Table 24.1 con- 
tains information about a sample of 94 students. Use 
this sample to give a 95% confidence interval for this 
proportion. Ej)y,) FMDRINK 


Diabetic mice. The body’s natural electrical field 
helps wounds heal. If diabetes changes this field, that 
might explain why people with diabetes heal slowly. 
A study of this idea compared normal mice and mice 
bred to spontaneously develop diabetes. The investi- 
gators attached sensors to the right hip and front feet 
of the mice and measured the difference in electrical 


24.59 
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potential (millivolts) between these locations. Here 
are the data:'” jy) mice 


DIABETIC MICE 


14.70 13.60 740 1.05 10.55 16.40 
10.00 22.60 15.20 19.60 17.25 18.40 
9.80 11.70 14.85 14.45 18.25 10.15 
LO:8 50:3 O04 5 8:55.85 ae L920 
NORMAL MICE 

13'S Ome 5 Oe 4325 (Om AO) 

7.20 10.00 14.55 13.30 6.65 

950 1040 7.75 870 8.85 

8.40 855 12.60 


(a) Make a stemplot of each sample of potentials. 
There is a low outlier in the diabetic group. Does it 
appear that potentials in the two groups differ in a 
systematic way? 

(b) Is there significant evidence of a difference in 
mean potentials between the two groups? 

(c) Repeat your inference without the outlier. Does 
the outlier affect your conclusion? 


Keeping crackers from breaking. We don’t like 
to find broken crackers when we open the package. 
How can makers reduce breaking? One idea is to 
microwave the crackers for 30 seconds right after 
baking them. Analyze the following results from 
two experiments intended to examine this idea.’° 
Does microwaving significantly improve indicators 
of future breaking? How large is the improvement? 
What do you conclude about the idea of microwav- 
ing crackers? 


(a) The experimenter randomly assigned 65 newly 
baked crackers to be microwaved and another 65 toa 
control group that is not microwaved. Fourteen days 
after baking, 3 of the 65 microwaved crackers and 57 
of the 65 crackers in the control group showed visible 
checking, which is the starting point for breaks. 


PARENT ALLOWS STUDENT TO DRINK 


2.5% — lk 29- 3 1 3 3 3 25 2.3 32: 9 2 
7 7 OS 4 8 6 66 3 © 3 4 7 5 
3.5 2 1 5 3 3 6 4 2 7 5 8 1 
6 5) Desh. « 3) 40 9 5 a 4 3 4 6 4 
5 1 5 3 10 i 4 4 4 4 2 29 “2.9 


PARENT DOES NOT ALLOW STUDENT TO DRINK 


3.5 3 5 1 1 


3 


4 


4 3 6 5 3 


4 4 5 7 7 


39 3 


10 at 9 2 7 
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(b) The experimenter randomly assigned 20 crackers 
to be microwaved and another 20 to a control group. 
After 14 days, he broke the crackers. Here are summa- 
ries of the pressure needed to break them, in pounds 
per square inch: 


MICROWAVE CONTROL 
Mean 139.6 720) 
Standard deviation 33.6 22.6 


Falling through the ice. Table 7.3 (page 195) 
gives the dates on which a wooden tripod fell through 
the ice of the Tanana River in Alaska, thus decid- 
ing the winner of the Nenana Ice Classic contest, 
for the years 1917 to 2013. Give a 95% confidence 
interval for the mean date on which the tripod falls 
through the ice. After calculating the interval in the 
scale used in the table (days from April 20, which is 
Day 1), translate your result into calendar dates and 
hours within the dates. (Each hour is 1/24, or 0.042, 
of a day.) jy) TANANA 


A case for the Supreme Court. In 1986, a Texas 
jury found a black man guilty of murder. The pros- 
ecutors had used “peremptory challenges” to remove 
10 of the 11 blacks and 4 of the 31 whites in the pool 
from which the jury was chosen.”! The law says that 
there must be a plausible reason (that is, a reason 
other than race) for different treatment of blacks and 
whites in the jury pool. When the case reached the 
Supreme Court 17 years later, the Court said that 
“happenstance is unlikely to produce this disparity.” 
Explain why the methods we know can’t be safely 
used to evaluate the inference that lies behind the 
Court’s finding that chance is unlikely to produce so 
large a black-white difference. 


Mouse genes. A study of genetic influences on 
diabetes compared normal mice with similar mice 
genetically altered to remove a gene called aP2. Mice 
of both types were allowed to become obese by eating 
a high-fat diet. The researchers then measured the 
levels of insulin and glucose in their blood plasma. 
Here are some excerpts from their findings.”’ The 
normal mice are called “wild-type” and the altered 
mice are called “aP2~/~.” 


Each value is the mean + SEM of measurements on 
at least 10 mice. Mean values of each plasma com- 
ponent are compared between aP2~/~ mice and 
wild-type controls by Student’s t test (P < 0.05 
and ““P < 0.005). 


PARAMETER WILD TYPE aP2-- 
Insulin (ng/dL) Oe) ae Oh sz Oy 
Glucose (mg/dL) 230525) 15 OReeaeli7ce 
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Despite much greater circulating amounts of in- 
sulin, the wild-type mice had higher blood glucose 
than the aP2~/~ animals. These results indicate 
that the absence of aP2 interferes with the develop- 
ment of dietary obesity-induced insulin resistance. 


Other biologists are supposed to understand the sta- 
tistics reported so tersely. 


(a) What does “SEM” mean? What is the expres- 
sion for SEM based on n, x, and s from a sample? 
(b) Which of the tests we have studied did the 
researchers apply? 

(c) Explain to a biologist who knows no statistics what 
P < 0.05 and P < 0.005 mean. Which is stronger 
evidence of a difference between the two types of mice? 


Mouse genes, continued. The report quoted in 
Exercise 24.62 says only that the sample sizes were 
“at least 10.” Suppose that the results are based on 
exactly 10 mice of each type. Use the values in the 
table to find x and s for the insulin concentrations in 
the two types of mice. Carry out a test to assess the 
significance of the difference in mean insulin con- 
centration. Does your P-value confirm the claim in 
the report that P < 0.005? 


(Optional topic) Which font? Plain type fonts 
such as Times New Roman are easier to read than 
fancy fonts such as Gigi. A group of 25 volunteer 
subjects read the same text in both fonts. (This is 
a matched pairs design. One-sample procedures 
for proportions, like those for means, are used to 
analyze data from matched pairs designs.) Of the 
25 subjects, 17 said that they preferred Times New 
Roman for web use. But 20 said that Gigi was more 
attractive.” 


(a) Because the subjects were volunteers, conclu- 
sions from this sample can be challenged. Show 
that the sample size condition for the large-sample 
confidence interval is not met, but that the condition 
for the plus four interval is met. 

(b) Give a 95% confidence interval for the 
proportion of all adults who prefer Times New 
Roman for web use. 

(c) Give a 90% confidence interval for the proportion 
of all adults who think Gigi is more attractive. 


Opinions about evolution. A sample survey funded 
by the National Science Foundation asked a random 
sample of American adults about biological evolution.”* 
One question asked subjects to answer “true,” “false,” 
or “not sure” to the statement, “Human beings, as we 
know them today, developed from earlier species of ani- 
mals.” Of the 1484 respondents, 594 said “true.” What 
can you say with 95% confidence about the percent of 
all American adults who think that humans developed 
from earlier species of animals? 
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Inference about 
Relationsnips 


tatistical inference offers more methods than anyone can know well, 
as a glance at the offerings of any large statistical software package 
demonstrates. In an introductory text, we must be selective. Parts | to 
IV have laid a foundation for understanding statistics: 


= The nature and purpose of data analysis. 
w The central ideas of designs for data production. 
= The reasoning behind confidence intervals and significance tests. 


= Experience applying these ideas in practice. 


Each of the three chapters of Part V offers an introduction to a more advanced 
topic in statistical inference. You may choose to read any or all of them, in any 
order. 

What makes a statistical method “more advanced"? More complex data, for 
one thing. In Part IV, we looked only at methods for inference about a single 
population parameter and for comparing two parameters. All of the chapters 
in Part V present methods for studying relationships between two variables. 
In Chapter 25, both variables are categorical, with data given as a two-way 
table of counts of outcomes. Chapter 26 considers inference in the setting of 
regressing a response variable on an explanatory variable. This is an important 
type of relationship between two quantitative variables. In Chapter 27, we meet 
methods for comparing the mean response in more than two groups. Here, the 
explanatory variable (group) is categorical and the response variable is quanti- 
tative. These chapters together bring our knowledge of inference to the same 
point that our study of data analysis reached in Chapters 1 to 7. 

With greater complexity comes greater reliance on technology. In these 
final three chapters, you will more often be interpreting the output of statistical 
software or using software yourself. With effort, you can do the calculations 
needed in Chapter 25 with a basic calculator. In Chapters 26 and 27, the pain 
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is too great and the contribution to learning too small. 
ideas without step-by-step arithmetic. 


Fortunately, you can grasp the 


Another aspect of “more advanced" methods is new concepts and ideas. This 
is where we draw the line in deciding what statistical topics we can master in a first 
course. Part V builds elaborate methods on the foundation we have laid without intro- 
ducing fundamentally new concepts. You can see that statistical practice does need 
additional big ideas by reading the sections on “the problem of multiple comparisons" 
in Chapters 25 and 27. But the ideas you already know place you among the world’s 


statistical sophisticates. 
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he two-sample z procedures of Chapter 23 allow us to compare the proportions 

of successes in two groups, either two populations or two treatment groups in 

an experiment. In the initial example of this chapter, we investigate where 
young people live. Suppose we want to compare the proportions of men and women 
who live with their parents. This is a question about the relationship between two 
categorical variables, sex (female or male) and “Where do you live?” (with parents 
or not), which can be answered using the two-sample zy procedures of Chapter 23. 
In fact, the data include three more outcomes for “Where do you live?”: in another 
person’s home, in your own place, and in group quarters such as a dormitory. When 
there are more than two outcomes, or when we want to compare more than two 
groups, we need a new statistical test. The new test addresses a general question: is 
there a relationship between two categorical variables? 


25.1 Two-way tables 


We saw in Chapter 6 that we can present data on two categorical variables in a 
two-way table of counts. That’s our starting point. Let’s begin our exploration of 
where college-age young people live. 


25.1 
25.2 


25.3 


25.4 
25.5 


25.6 


25.7 


25.8 
25.9 


Two-way tables 


The problem of multiple 
comparisons 


Expected counts in two-way 
tables 


The chi-square test statistic 


Cell counts required for the 
chi-square test 


Using technology 


Uses of the chi-square 
test: independence and 
homogeneity 


The chi-square distributions 


The chi-square test for 
goodness of fit* 
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LIVING 


cell 


A sample survey asked a random sample of young adults, “Where do you live 
now? That is, where do you stay most often?” Table 25.1 is a two-way table of all 
2984 people in the sample (both men and women) classified by their age and by 
where they lived.’ Living arrangement is a categorical variable. Even though age is 
quantitative, the two-way table treats age as dividing young adults into four catego- 
ries. Table 25.1 gives the counts for all 20 combinations of age and living arrange- 
ment. Each of the 20 counts occupies a cell of the table. m 


AGE (YEARS) 
LIVING ARRANGEMENT 19 20 21 22 TOTAL 
Parents’ home 324 378 337 318 1357 
Another person’s home 37 47 40 38 162 
Your own place 116 279 372 487 1254 
Group quarters 58 60 49 25 192 
Other 5 2 3 9 19 
Total 540 766 801 877 2984 


As usual, we prepare for inference by first doing data analysis. Because we 
think that age helps explain where young people live, find the percents of people 
in each age group who have each living arrangement. The percents appear in 
Table 25.2. Each column adds to 100% (up to roundoff error) because we are 
looking at each age group separately. In the language of Chapter 6 (page 166), 
Table 25.2 shows the four conditional distributions of living arrangements given 
a specific age. 

Figure 25.1 is a bar graph comparing the four conditional distributions. The 
graph shows a strong relationship between age and living arrangement. As young 
adults age from 19 to 22, the percent living with their parents drops and the percent 
living in their own place rises. The percent living in group quarters also declines 
with age as college students move out of dormitories. Are these differences among 
the four age groups large enough to be statistically significant? 


19 20 21 22 
Parents’ home 60.0% 49.3% 42.1% 36.3% 
Another person’s home 6.9% 6.1% 5.0% 4.3% 
Your own place 21.5% 36.4% 46.4% 55.5% 
Group quarters 10.7% 7.8% 6.1% 2.9% 
Other 0.9% 0.3% 0.4% 1.0% 
Total 100.0% 99.9% 100.0% 100.0% 
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FIGURE 25.1 

Bar graph comparing the four conditional 
60 distributions of living arrangements 
given age, for Example 25.1. 


Conditional distributions of living arrangements given age 


Percent 


Live 


Age 


Percent within levels of Age. 


Apply Your Knowledge 


25.1 Visitors to the Monterey Bay Aquarium. The Monterey Bay Aquarium, 
founded in 1984, is situated on the beautiful coast of Monterey Bay in the 
historic Cannery Row district. During 2009-13, the aquarium conducted a 
survey of a random sample of visitors as they exited the museum. The survey 
included visitor demographic information, use of social media, and opinions 
on their aquarium visit. Here is a comparison of the age distribution of His- 
panic and Caucasian visitors in 2013: AQUARIUM 


Age Category Caucasian Hispanic 
Under 25 181 79 
25 to 34 324 120 
35 to 44 232 74 
45 and over 571 44 


(a) What percent of Caucasian visitors are in each age category? What 
percent of Hispanic visitors are in each category? Each column 
should add to 100% (up to roundoff error). These are the conditional 
distributions of age category given ethnic group. 


(b) Make a bar graph that compares the two conditional distributions. 
What are the most important differences in age between the two 
ethnic groups? 


25.2 Video Gaming and Grades. The popularity of computer, video, online, 
and virtual reality games has raised concerns about their ability to negatively 
impact youth, although the existing literature has been inconsistent. The 
data in this exercise are based on a recent survey of 14- to 18-year-olds in 
Connecticut high schools. Here are the grade distributions of boys who have 
and have not played video games.’ GAMING 


= 
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HE STARTED IT! 


A study of deaths in bar 
fights showed that, in 90% of the 
cases, the person who died started 
the fight. You shouldn't believe this. 
If you killed someone in a fight, what 
would you say when the police asked 
you who started the fight? After all, 
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Grade Average 


As and Bs Cs Ds and Fs 
Played games 736 450 193 
Never played games 205 144 80 


(a) It appears that boys who have played video games have better grades 
than those who have never played video games. Give percents to back 
up this claim. Make a bar graph that compares your percents for boys 
who have and have not played video games. 

(b) Association does not prove causation. Explain why you can’t conclude 
from this study that playing video games improves grades for boys. 


the null hypothesis is that there is no relationship between two categorical variables: 


dead men tell no tales. Hp: there is no relationship between age and living arrangement for the popu- 


multiple comparisons 


lation of all American young adults 


The alternative hypothesis says that there is a relationship but does not specify any 


particular kind of relationship: 


H,: there is some relationship between age and living arrangement for the 
population of all American young adults 


Any difference among the four distributions of living arrangements in the popula- 
tion of all young adults means that the null hypothesis is false and the alternative 
hypothesis is true. The alternative hypothesis is not one-sided or two-sided. We 


might call it “many-sided” because it allows any kind of difference. 


With only the methods we already know, we might start by comparing the pro- 
portions of people aged 19 and 22 who live with their parents. We could similarly 
compare other pairs of proportions, ending up with many tests and many P-values. 
This is a bad idea. The P-values belong to each test separately, not to the collection 
of all the tests together. Think of the distinction between the probability that a bas- 
ketball player makes a free throw and the probability that she makes all of her free 


the individual P-values and confidence levels don’t tell us how confident we can 


A throws in a game. When we do many individual tests or confidence intervals, 


be in all of the inferences taken together. 


Because of this, it’s cheating to pick out one large difference from Table 25.2 
and then test its significance as if it were the only comparison we had in mind. 
For example, the percents of people aged 19 and 22 who live with their parents are 
significantly different (xy = 8.72, P < 0.001) if we make just this one comparison. 
But we could also pick a comparison that is not significant; for example, the pro- 
portions of people aged 21 and 22 who live in another person’s home do not differ 
significantly (zy = 0.64, P = 0.522). Individual comparisons can’t tell us whether 


the four distributions, each with five outcomes, are significantly different. 


The problem of how to do many comparisons at once with an overall measure 
of confidence in all our conclusions is common in statistics. This is the problem of 
multiple comparisons. Statistical methods for dealing with multiple comparisons 


usually have two steps: 


The null hypothesis in Example 25.1 is that, in the population of all American young 
adults, there is no difference among the four conditional distributions of living arrange- 
ments for people aged 19, 20, 21, and 22. If the null hypothesis is true, the differences in 
the sample are just accidents due to random selection of the sample. Put more generally, 
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1. An overall test to see if there is good evidence of any differences among the 
parameters that we want to compare. 


2. A detailed follow-up analysis to decide which of the parameters differ and to 
estimate how large the differences are. 


The overall test, though more complex than the tests we met earlier, is reasonably 
straightforward. The follow-up analysis can be quite elaborate. We will concentrate on 
the overall test and use data analysis to describe in detail the nature of the differences. 


Apply Your Knowledge 


25.3 Video Gaming and Grades. In the setting of Exercise 25.2, we might do 
several significance tests to compare the grades of boys who have played and 
never played video games. GAMING 


(a) Is there a significant difference between the proportions of Cs gotten by 
boys who have played and never played video games? Give the P-value. 


(b) Is there a significant difference between the proportions of Ds and Fs 
gotten by boys who have played and never played video games? Give 
the P-value. 


(c) Explain clearly why P-values for individual outcomes like these can’t 
tell us whether the two distributions for all three outcomes for boys 
who have played and never played differ significantly. 


25.4 Is Astrology Scientific? The University of Chicago’s General Social Sur- 
vey (GSS) is the nation’s most important social science sample survey. The 
GSS asked a random sample of adults their opinion about whether astrology 
is very scientific, sort of scientific, or not at all scientific. Here is a two-way 
table of counts for people in the sample who had three levels of higher educa- 
tion degrees:* am ASTRLGY 


Degree Held 


Junior College Bachelor Graduate 


Not at all scientific 44 122 Tal 
Very or sort of scientific 31 62 27 


(a) Give three 95% confidence intervals, for the percents of people with 
each degree who think that astrology is not at all scientific. 


(b) Explain clearly why we are not 95% confident that all three of these 
intervals capture their respective population proportions. 


Our general null hypothesis Ho is that there is no relationship between the two 
categorical variables that label the rows and columns of a two-way table. To test Ho, 
we compare the observed counts in the table with the expected counts, the counts 
we would expect—except for random variation—if Hy were true. If the observed 
counts are far from the expected counts, that is evidence against Hp. It is easy to 
find the expected counts. 


Expected Counts 


The expected count in any cell of a two-way table when H), is true is 


row total X column total 


expected count = 
P table total 
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Let’s find the expected counts for the study of where young people live. Look back 
at the two-way table of counts, Table 25.1. That table includes the row and column 
totals. The expected count of 19-year-olds who live in their parents’ home is 
row 1 total X column 1 total (1357)(540) 
table total 2984 


= 245.57 


The expected count of 22-year-olds who live with their parents is 


row 1 total X column 4 total (1357)(877) 
table total 2984 


= 398.82 


The actual counts are 324 and 318. More younger people and fewer older people 
live with their parents than we would expect if there were no relationship between 
age and living arrangement. Table 25.3 shows all 20 expected counts. 


AGE 
19 20 21 22 TOTAL 
Parents’ home 245.57 348.35 364.26 398.82 1357 
Another person’s home 29.32 41.59 43.49 47.61 162 
Your own place 226.93 321.90 336.61 368.55 1254 
Group quarters 34.75 49.29 51.54 56.43 192 
Other 3.44 4.88 5.10 5.58 19 
Total 540 766 801 877 2984 


As this table shows, the expected counts have exactly the same row and column 
totals (up to roundoff error) as the observed counts. That’s a good way to check your 
work. Comparing the actual counts (Table 25.1) and the expected counts (Table 
25.3) shows in what ways the data diverge from the null hypothesis. m 


Why the formula works Where does the formula for an expected count come 
from? Think of a basketball player who makes 70% of her free throws in the long run. 
If she shoots 10 free throws in a game, we expect her to make 70% of them, or seven of 
the 10. Of course, she won't make exactly seven every time she shoots 10 free throws 
in a game. There is chance variation from game to game. But in the long run, seven 
of 10 is what we expect. In more formal language, if we have n independent tries and 
the probability of a success on each try is p, we expect np successes. 

Now go back to the count of 19-year-olds living in their parents’ home. The 
proportion of all 2984 subjects who live with their parents is 


count of successes _ rowl total — 1357 


table total ~ table total 2984 
Think of this as p, the overall proportion of successes. If Hy is true, we expect 
(except for random variation) this same proportion of successes in all four 
age groups. So the expected count of successes among the 540 19-year-olds is 


1357 


np = (540)( 227) = 245.57 


That’s the formula in the Expected Counts box. 
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* The StatBoards video, Data Analysis and Expected Counts, provides an 
additional example of data analysis in a two-way table and the computation 
of expected counts. 


Apply Your Knowledge 


25.5 Visitors to the Monterey Bay Aquarium. The two-way table in 
Exercise 25.1 displays data on the relationship between age and ethnicity 
for a random sample of visitors to the Monterey Bay aquarium. The null 
hypothesis is that there is no relationship between ethnicity (Caucasian vs. 
Hispanic) and age category. AQUARIUM 


(a) If this hypothesis is true, what are the expected counts for the 
four age categories for Hispanic visitors? This is one column of the 
two-way table of expected counts. Find the column total and verify 
that it agrees with the column total for the observed counts. 


(b) Hispanic visitors tend to be younger than Caucasian visitors to the 
aquarium. How is this reflected when comparing the observed and the 
expected counts for Hispanic visitors? 


25.6 Video Gaming and Grades. Exercise 25.2 describes a comparison of the 
gerade distribution of a sample of 14- to 18-year-old boys in Connecticut who 
do and don’t play video games. The null hypothesis “no relationship” says 
that in the population of all 14- to 18-year-old boys in Connecticut, the pro- 
portions who have each grade average are the same for those who play and 
don’t play video games. GAMING 


(a) Find the expected cell counts if this hypothesis is true, and display 
them in a two-way table. Add the row and column totals to your table, 
and check that they agree with the totals for the observed counts. 

(b) Are there any large deviations between the observed counts and the 
expected counts? What kind of relationship between the two variables 
do these deviations point to? 


To test whether the observed differences among the four distributions of living 
arrangements given age are statistically significant, we compare the observed and 
expected counts. The test statistic that makes the comparison is the chi-square statistic. 


Chi-Square Statistic 


The chi-square statistic is a measure of how far the observed counts in a two-way 
table are from the expected counts if Hy were true. The formula for the statistic is 


x=> 


The sum is over all cells in the table. 


(observed count — expected count)? 


expected count 


As you might guess, the symbol y in the box is the Greek letter chi. The chi- 
square statistic is a sum of terms, one for each cell in the table. 
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In the study of where young people live, 324 19-year-olds lived with their parents. 
The expected count for this cell is 245.57. So the term of the chi-square statistic 
from this cell is 


(observed count — expected count)? (324 — 245.57)? 


expected count 245.57 
6151.26 
245,57 
The chi-square statistic x? is the sum of 20 terms like this one. Here they are, 
arranged to match the layout of the two-way table: 
25.05 + 2.53 + 2.04 + 16.38 
2.01 + 0.71 + 0.28 + 1.94 
54.23 + 5.72 + 3.72 + 38.07 
15.56 ++. 2.33) - O43 4 17-54 
0.71 + 1.70 + 0.87 + 2.09 
193.58 


= 25.05 


2 


X 


+ +4 4 


To find the value x? = 193.58, we had to calculate the 20 expected cell counts in 

Table 25.3 and then the 20 terms of the sum. Each term in the sum represents the 

“contribution” to the chi-square statistic for one of the cells in the table. Moreover, 
even rounding each term to two decimal places as we have done can still 

A create roundoff error in the sum. Because of this, software is very handy 
in finding x?. ™ 


Think of xy’? as a measure of the distance of the observed counts from the 
expected counts if Hy were true. Like any distance, it is always zero or positive, and 
it is zero only when the observed counts are exactly equal to the expected counts. 
Large values of x? are evidence against H) because they say that the observed 

counts are far from what we would expect if Hy were true. Although the 
A alternative hypothesis H, is many-sided, the chi-square test is one-sided because 

any violation of H, tends to produce a large value of 7. Small values of y” 
are not evidence against Ho. 


The chi-square test, like the z procedures for comparing two proportions, is an 
approximate method that becomes more accurate as the counts in the cells of the 
table get larger. We must therefore check that the counts are large enough to allow 
us to trust the P-value. Fortunately, the chi-square approximation is accurate for 
quite modest counts. Here is a practical guideline.’ 


Cell Counts Required for the Chi-Square Test 


You can safely use the chi-square test with critical values from the chi-square dis- 
tribution when no more than 20% of the expected counts are less than 5, and all 
individual expected counts are | or greater. In particular, all four expected counts in 
a2 X 2 table should be 5 or greater. 


25.6 Using Technology 


Note that the guideline uses expected cell counts. The expected counts for the 
living arrangements study of Example 25.1 appear in Table 25.3. Only 2 of the 20 
expected counts (that’s 10%) are less than 5, and all are greater than 1, so the data 
meet the guideline for safe use of chi-square. 


Calculating the expected counts and then the chi-square statistic by hand is 
time consuming. As usual, software saves time and always gets the arithmetic right. 
Figure 25.2 shows output for the chi-square test for the living arrangements data 
from a graphing calculator and three statistical programs. 


Texas Instruments Graphing Calculator 


X2£-Test 
*%2=193, 54827798 
P=6,981157%e-35 


df=12 


Minitab 


Fl Session | at is ot 


Expected counts are printed below observed counts 
Chi-Square contributions are printed below expected counts 


Age 19 Age 20 Age 21 Age 22 Total 


Parents 324 378 337 318 1357 
245.6 348.3 364.3 398.8 
25.049 Di 525 2.040 16.379 

Another uh 47 40 38 162 
ZONES: 41.6 43.5 47.6 
2.014 C2705 0.279 1.940 

OwnPlace 116 219) 372 487 1254 
226:..9 Sea) 336756: 368.6 
54.226 Sew rg 3.720 38.068 

Group 58 60 49 25) 92 
34.7 49.3 Byles) 56.4 
15.564 2329) Ond25 U7, 505 

Other 5 2 3 9 as) 
3.4 4.9 Bren: 56 
0:27:09 AO 0.865 2.090 

Total 540 766 801 877 2984 


Pearson Chi-Square = 193.548, DF = 12, P-Value = 0.000 
2 cells with expected counts less than 5. 


ae I 


FIGURE 25.2 


This key identifies 
| | the output for each 
cell in the table 


We have highlighted 
the six cells with the 


largest contribution 
to chi-square in red 


7A 


Output from a graphing calculator, Minitab, Crunchlt!, and JMP for the two-way table in the study of where young people live, for 


Example 25.4. 


585 


586 CHAPTER 25 @ Two Categorical Variables: The Chi-Square Test 


Crunchlt! 
——___—_____— - — A\¥) Contingency Table 
Results - Contingency Table 
Age 
Export * g 
Count Agel9 Age20 Age21 Age22 
Age19 Col% 
Expected 
Cell Chi*2 
Parents Parents 324 378 337 318 | 1357 
60.00 49.35 42.07 36.26 
245.57 | 348.345 | 364.262 | 398.823 
25.0491] 2.5245) 2.0403 | 16.3792 
37 Another 37 47 40 38] 162 
6.85 6.14 4.99 4.33 
Another 22.84 : : . 29.3164 | 41.5858 | 43.4859 | 47.6119 
oO 
6.852 iS 2.0138| 0.7049| 0.2794] 1.9405 
1.240 “4 | OwmPlace 116 279 372 487 |1254 
: 21.48 36.42 46.44 55.53 
116 226.93 | 321.905 | 336.613 | 368.552 
‘5 54.2260] 5.7185] 3.7201 | 38.0680 
OwnPlace 9.250 Group 58 60 49 wis) || al) 
21.48 : : : : 10.74 7.83 6.12 2.85 
3.887 f : 34.7453 | 49.2869|51.5389| 56.429 
15.5641] 2.3286] 0.1251/17.5048 
Other 5 ® 3 9 19 
Group 0.93 0.26 0.37 1.03 
3.43834 |4.87735| 5.1002|5.58412 
0.7093| 1.6975| 0.8648] 2.0895 
540 766 801 877 | 2984 
Other 
AlTests 
N DF -LogLike RSquare (U) 
2984 12 101.54293 0.0312 
Test ChiSquare Prob>ChiSq 
All Pearson 193.548 <.0001* 
Count 


This key identifies 
% of Row g the output for each 


% of Col cell in the table 
% of Total 


Chi-squared statistic: 193.5 


df: 12 
<0.0001 
FIGURE 25.2 
(Continued) 


All four outputs tell us that the chi-square statistic is ~* = 193.5, with very small 
P-value. Minitab reports P = 0.000, rounded to three decimal places, while Crunchit! 
and JMP report P < 0.0001. The graphing calculator says that P = 6.98 x 10°-*, 
a very small number indeed. This P-value comes from an approximation to the sam- 
pling distribution of y?. The approximation is less accurate far out in the tail than near 
the center of the distribution, so you should not take 6.98 X 107° literally. Just read 
itas “P is very small.” The sample gives very strong evidence that living arrangements 
differ among the four age groups. 

Statistical software generally offers additional information on request. We asked 
both Minitab and JMP to show the observed counts and expected counts and also 
the term in the chi-square statistic for each cell, called the “chi-square contribution” 
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or “cell chi*2.” We also told JMP to include the column percents associated with 
each count. The top left cell has expected count 245.57 and contributes 25.049 to 
the chi-square statistic, as we calculated earlier. (Roundoff errors are smaller with 
software than in hand calculation.) The graphing calculator also displays the observed 
and expected cell counts on request. We told the calculator to display the expected 
counts rounded to two decimal places. To see the remaining columns of expected 
counts on the calculator’s small screen, you must scroll to the right. Crunchlit! displays 
the counts, the row and column percents associated with each count, and the overall 
percent of the total associated with the count. Depending on the problem, only one 
of these percents will generally be of interest. 

What about the Excel spreadsheet program? Excel is, in general, a poor choice 
for statistics. It is particularly awkward for chi-square because its Data Analysis tool 
pack omits this common test. You will need add-in modules to use Excel effectively 
for chi-square. m 


The chi-square test is an overall test for detecting relationships between two 
categorical variables. If the test is significant, it is important to look at the data 
to learn the nature of the relationship. We have three ways to look at the living 
arrangements data: 


Compare selected percents: which living arrangements occur in quite different 
percents of the four age groups? This is the method we learned in Chapter 6. 


Compare observed and expected cell counts: which cells have more or fewer 
observations than we would expect if Hy were true? 


Look at the terms of the chi-square statistic: which cells contribute the 
most to the value of x’? 


There is very strong evidence (x7? = 193.55, P < 0.001) that living arrangements a 
of young people are not the same for ages 19, 20, 21, and 22. Comparing selected =I 1 1 
percents—specifically, the four conditional distributions of living arrangements for 
each age in Table 25.2 and Figure 25.1—shows how young people become more 
independent as they grow older. 

The additional information provided by programs like Minitab or JMP shows what 
differences among the age groups explain the large value of the chi-square statistic. 
Look at the 20 terms in the chi-square statistic in the Minitab output and compare 
the observed and expected counts in the cells that contribute most to chi-square. Just 
six of the 20 cells, those highlighted in red in Figure 25.2, contribute 166.79 of the 
total chi-square y* = 193.55. These six highlighted cells occur in pairs: 


54.226 and 38.068: fewer 19-year-olds than expected and more 22-year-olds 
than expected live in their own place. 


25.049 and 16.379: more 19-year-olds than expected and fewer 22-year-olds 
than expected live in their parents’ home. 


15.564 and 17.505: more 19-year-olds than expected and fewer 22-year-olds 
than expected live in group quarters. 


These three trends display the increase in independent living between age 19 and 
age 22. m 
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Apply Your Knowledge 


25.7 Visitors to the Monterey Bay Aquarium. Figure 25.3 displays SAS out- 
put for the two-way table in Exercise 25.1. The output includes the two-way 
table of observed counts, the expected counts, and each cell’s contribution 
to the chi-square statistic. AQUARIUM 


Frequency Table of Age_Group by Race 
Expected Race 
Cell Chi-Square 


Age_Group | Caucasian | Hispanic | Total 
Under 25 181 79 | 260 
209.28 50.72 
3.8215 15.768 
25 to 34 324 120 | 444 
357.39 | 86.614 
3.1188 | 12.869 


35 to 44 232 74 | 306 
246.31 59.694 
0.831 3.4288 

45 and over 571 44 |} 615 
495.03 | 119.97 
11.66} 48.109 

Total 1308 317 |1625 


Statistics for Table of Age_Group by Race 


Statistic DF Value Prob 
Chi-Square 3 | 99.6058 | <.0001 


FIGURE 25.3 
SAS output for the two-way table of visitor age by race, for Exercise 25.7. 


(a) Verify from the SAS output that the data meet the cell count requirement 
for use of chi-square. 


(b) What hypotheses does chi-square test? What are the test statistic and 
its P-value? 


A\~) Contingency Table 


(c) Which cells contribute the most to 77? Compare the observed and 
expected counts in these cells and comment on the most important 


E 3 - eg a differences in the age distribution of Hispanic and Caucasian visitors 
g 717.721 |453.056 |208.223 to the aquarium. 
a D 205 144 80) 429 
a seg el ie ao 25.8 Video Gaming and Grades. Your data analysis in Exercise 25.2 found 
941) 594) 273] 1808 that boys who have played video games tend to have higher grades than 
those who have not. Figure 25.4 gives JMP output for the two-way table in 
A Tests Exercise 25.2. £ij)| GAMING 


ane eal (a) Verify from the output that the data meet the cell count requirement 


Le OS Re 2ee sr 20 7 Lo) 


for use of chi-square. 


ecesae oT __ OES (b) What are the chi-square statistic and its P-value? Explain in simple 


FIGURE 25.4 language what it means to reject Hp in this setting. 


JMP output for the study of video (c) Give an overall conclusion that refers to row percents to describe the 
gaming and grades, for Exercise 25.8. nature of the relationship between playing video games and grades. 


25.7 Uses of the Chi-Square Test: Independence and Homogeneity 


25.9 Is Astrology Scientific? The General Social Survey asked a random sample 
of adults about their education and about their view of astrology as scientific 
or not. Here are the data for people with three levels of higher education 


degrees: ASTRLGY 


Degree Held 


Bachelor Graduate 


Junior College 


Not at all scientific 


Very or sort of scientific 31 62 27 


Figure 25.5 gives Minitab chi-square output for these data. Follow the Plan, 
Solve, and Conclude steps of the four-step process in using the information 
in the output to describe how people with these levels of education differ 
in their opinions about astrology. Be sure that your Solve step includes data 
analysis and checking conditions for inference as well as a formal test. 


H Session 
Junior BH 
college Bachelor Graduate All 
Not Science 44 bee) al 237 
5B Ou) 66.30 Hides Aly 66.39 
49.79 ID 25 65.06 23:7 200 
0.67329 0.00019 0.54255 ce 
Science Bale 62 DY. LANG) 
Ane '3 335.10 2155 icine 
25,ecak 685 32.94 210.00 
32975 0.00037 COWES :3 oy 
All TS 184 98 B57 
100.00 100.00 100.00 100.00 
15/010) 184.00 98.00 Bhar H06) 
* * * * 
€ell Contents: Count 


% of Column 
Expected count 
Contribution to Chi-square 


Pearson Chi-Square = 3.618, DF = 2, P-Value = 0.16 
5 ll 


FIGURE 25.5 
Minitab output for the two-way table of opinion about astrology by degree held, for Exercise 25.9. 


Two-way tables can arise in several ways. Most commonly, the subjects in a single 
sample are classified by two categorical variables. For example, we classified young 
adults by their age group and where they lived. The question of whether or not there 
is a relationship between these two classification variables can be stated in terms 
of the independence of these variables, using the definition of independence from 
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chi-square test for 
independence 


chi-square test for homogeneity 


our study of probability in Chapter 13. Recall that for two events, A and B, to be 
independent, we must have P(A) = P(A|B). In this setting, independence of age 
and living arrangement would imply that the conditional probability that a randomly 
selected young adult lives at home given their age would be the same for all ages. That 
is, knowing a young adult’s age gives us no information about the probability of liv- 
ing at home. The test we have been using to this point is generally referred to as the 
chi-square test for independence, as thus far all the examples and exercises have 
been questions about whether two classification variables are independent or not. 

The next example illustrates a different setting for a two-way table, in which we 
compare separate samples from two or more populations, or from two or more treat- 
ments in a randomized controlled experiment. In the setting of a randomized con- 
trolled experiment, we think of the different treatment groups as the separate popu- 
lations. “Which population” is now one of the variables for the two-way table. For 
each sample, we classify individuals according to one variable, and we are interested 
in whether or not the probabilities of being classified in each category of this variable 
are the same for each population. Although our calculations for the chi-square test are 
unchanged, the method of collecting the data is different. This use of the chi-square 
test is referred to as the chi-square test for homogeneity because we are interested in 
whether or not the populations from which the samples are selected are homogeneous 
(the same) with respect to the single classification variable. 


LIVECELL 


A.B./Getty Images 


(STATE! Random digit dialing (RDD) telephone surveys do not call cell phone num- 
bers. We know that cell-only users tend to be younger (see Exercise 25.31), which 
results in younger adults being underrepresented in an RDD sample. Survey organiza- 
tions can compensate for this by weighting the younger adults in the RDD sample 
more heavily, but this assumes that the young adults accessible by landline are simi- 
lar on the survey issue to their cell-only peers, who are excluded. There is growing 
evidence that this is not always the case. In young adults between 18 and 25 years 
of age, Pew interviewed separate random samples of cell-only and landline telephone 
users and compared them on demographic, lifestyle, and attitudinal issues.® Here are 
the results on the living situations for the two groups: 


Landline Sample Cell-Only Sample 
Live with parents 165 25 
Rent 95 74 
Own 46 13 
Live in a dorm 7 5) 
Other/refused 16 3 
Total 329 130 


N: Carry out a chi-square test for 


Ho: homogeneity; that is, the distribution of living situation is the same in both 
populations 


H,: lack of homogeneity; that is, the living situation distribution for the cell-only 
population differs from that for landline users 


Compare column percents or observed versus expected cell counts or terms of chi- 
square to see the nature of the differences in the distributions of living situations in 
the two populations. 


25.7 Uses of the Chi-Square Test: Independence and Homogeneity 


BQ RVA The Minitab output in Figure 25.6 includes the column percents. These give 
the distribution of living situation for each type of telephone use. Cell-only users are less 
likely to live with parents (19.23% versus 50.15% of landline users), more likely to rent 
(56.92% versus 28.88% of landline users), and more likely to live in a dorm (11.54% 
versus 2.13% of landline users). There is little difference in the percentages for the other 
two categories. To see if the differences are significant, first check the guidelines for use 
of chi-square. The samples can be assumed to be SRSs. The Minitab output shows that 
the expected counts in all of the 10 cells are greater than 5, so the conditions for use 
of chi-square (page 584) are satisfied. The chi-square test shows a highly significant 
difference between the distributions of living situations of the two groups of young adults 
(x? = 61.63, P = 0.000). Comparing observed and expected cell counts again shows 
that cell-only young adults are less likely to live with parents than would be expected if 
the null hypothesis of homogeneity were true, and more likely to rent or live in a dorm. 
The contributions to chi-square for the other two categories are quite small. 


(SS NSRUSM The association between living situation and type of telephone service 
is statistically significant. Young adults accessible by landline phone are more likely 
to live with their parents and less likely to rent or live in a dorm. The Pew study also 
found young adults with landlines tend to be more likely to attend religious services 
at least once a week and less likely to report drinking alcohol in the past seven 
days or to say that it is okay for people to smoke marijuana. They also tend to be 


fa, Session 
Landline Cell-only J 

sample sample All 
Live with parents L65: 25 190 
BiOn dS O23 pees) 
abexson ale) 5Se suk 190.00 
6.096 15.427 * 
Rent 25 74 169 
28.88 SiGm 2 36.82 
121.14 47.86 169.00 
5.639 14.270 * 
Own 46 13 59 
13918) 10.00 25.85: 
42.29 Gehl 59.00 
On 2'6, 0.824 * 
Live in a dorm a 1S 22 
Dieles 11.54 4.79 
Abeer ies 61.23) 22.00 
4.876 12.341 x 
Other/refused 16 3 19 
4.86 Die Sill 4.14 
eS n6 2 Sis ciel 19:00: 
0.416 1.054 x 
All 329 130 459 
100.00 100.00 100.00 
3:2:9:2:010 130.00 459.00 
* * * 

Cell Contents: Count 


2 


S of Column 
Expected count 
Contribution to Chi-square 


Pearson Chi-Square = 61.269, DF = 4, P-Value = 0.000 
pa | 
ie aA 
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FIGURE 25.6 
Minitab output for the two-way table 
of living situation and telephone use, 
for Example 25.6. 
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TESTS 


There are other chi-square tests for 
hypotheses more specific than “no 
relationship.” A sociologist places 
people in classes by social status, 
waits 10 years, then classifies the 
same people again. The row and 
column variables are the classes at 
the two times. She might test the 
hypothesis that there has been no 
change in the overall distribution 
of social status in the group. Or she 
might ask if moves up in status are 
balanced by matching moves down. 
These and other null hypotheses can 
be tested by variations of the chi- 
square test. 


MORE 
CHI-SQUARE 


less technology savvy using email, texting, and visiting social networking sites less 
frequently. Because of the rapid increase in cell-only users plus the differences in a 
number of dimensions between cell-only users and their landline peers, most survey 
organizations now routinely include a cell-only sample when conducting a survey.’ m 


One of the most useful properties of chi-square is that it tests the null hypothesis 
“the row and column variables are not related to each other” whenever this hypoth- 
esis makes sense for a two-way table. It makes sense when we are comparing a cat- 
egorical response in two or more samples, as when we compared people who have 
only a cell phone with people who have a landline phone. This is the chi-square test 
for homogeneity. The hypothesis also makes sense when we have data on two cat- 
egorical variables for the individuals in a single sample, as when we examined age 
group and living arrangement for a sample of young adults. This is the chi-square 
test for independence. Statistical significance has the same intuitive meaning in 
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both settings: “A relationship this strong is not likely to happen just by chance.” 


Uses of the Chi-Square Test 


Use the chi-square test to test the null hypothesis 


Hp: there is no relationship between two categorical variables 


when you have a two-way table from one of these situations: 


» Asingle SRS, with each individual classified according to both of two categorical vari- 
ables. In this case, the null hypothesis of no relationship says that the two categorical 
variables are independent, and the test is called the chi-square test of independence. 


«= Independent SRSs from two or more populations, with each individual clas- 
sified according to one categorical variable. (The other variable says which 
sample the individual comes from.) In this case, the null hypothesis of no 
relationship says the populations are homogeneous, and the test is called 
the chi-square test of homogeneity. 


() LaunchPad Online Resources 


* The StatBoards videos, The Chi-Square Test for Independence and The 
Chi-Square Test for Homogeneity, provide applications of the chi-square test 
in the settings of independence and homogeneity. 


Apply Your Knowledge 


25.10 Do You Use Cocaine? Sample surveys on sensitive issues can give differ- 


25.11 


ent results depending on how the question is asked. A University of Wiscon- 
sin study divided 2400 respondents into three groups at random. All were 
asked if they had ever used cocaine. One group of 800 was interviewed by 
phone; 21% said they had used cocaine. Another 800 people were asked the 
question in a one-on-one personal interview; 25% said “Yes.” The remaining 
800 were allowed to make an anonymous written response; 28% said “Yes.”® 
Are there statistically significant differences among these proportions? 


(a) Convert the information given into a two-way table of counts. 
(b) Should you use a chi-square test of independence or homogeneity? 
(c) Doacomplete analysis of these data, following the four-step process. 


Who’s More Informed About Politics? In 2012, the General Social 
Survey asked a random sample of adults, “Compared to most people, how 
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informed are you about politics?” Here are the data classified by their re- 
sponses to this question and their age group: 


Not at All ALittle Somewhat Very Extremely 
Age20-29 | 8 29 28 0 
Age 30-39 15 28 55 23 9 
Age 40-49 2 2» 49 26 14 
Age 50 or older 21 59 120 79 23 


(a) Should you use a chi-square test of independence or homogeneity? 
Explain. 


(b) Doa complete analysis of these data, following the four-step process. 


Software usually finds P-values for us. The P-value for a chi-square test comes 
from comparing the value of the chi-square statistic with critical values for a chi- 
square distribution. 


The Chi-Square Distributions 


The chi-square distributions are a family of distributions that take only positive 
values and are skewed to the right. A specific chi-square distribution is specified by 
giving its degrees of freedom. 

The chi-square test for a two-way table with r rows and c columns uses critical val- 
ues from the chi-square distribution with (r — 1)(c — 1) degrees of freedom. The 
P-value is the area under the density curve of this chi-square distribution to the right 
of the value of the test statistic. 


Figure 25.7 shows the density curves for three members of the chi-square family 
of distributions. As the degrees of freedom increase, the density curves become less 


FIGURE 25.7 

Density curves for the chi-square 
distributions with 1, 4, and 8 degrees 

of freedom. Chi-square distributions take 
only positive values and are right-skewed. 
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skewed, and larger values become more probable. Table D in the back of the book 
gives critical values for chi-square distributions. You can use Table D if you do not 
have software that gives you P-values for a chi-square test. 


The two-way table of five outcomes by four age groups for the living arrangements 
study (Table 25.1) study has five rows and four columns. That is, r= 5 and c = 4. 
The chi-square statistic therefore has degrees of freedom 


(r- 1) - 1) = — 1)(4 = 1) = (4)(8) = 12 


[- All four outputs in Figure 25.2 give 12 as the degrees of freedom. 
p 0.001 0.0005 The observed value of the chi-square statistic is y* = 193.55. Look in the 
x* 32.91 34.82 df = 12 row of Table D. The value v* = 193.55 falls above the largest critical value 


in the table, for P = 0.0005. Remember that the chi-square test is always one-sided. 
So the P-value of vy? = 193.55 is less than 0.0005. m 


We know that all z and t statistics measure the size of an effect in the standard 
scale centered at zero. We can roughly assess the size of any z or t statistic by the 
68—95-99.7 rule, though this is exact only for z. The chi-square statistic does not 
have any such natural interpretation. But here is a helpful fact: the mean of any chi- 
square distribution is equal to its degrees of freedom. In Example 25.7, y? would have 
mean 12 if the null hypothesis were true. The observed value x? = 193.55 is so 
much larger than 12 that we suspect it is significant even before we look at Table D. 


Apply Your Knowledge 


25.12 Visitors to the Monterey Bay Aquarium. The SAS output in 
Figure 25.3 gives the degrees of freedom for a table of age category vs. two 
ethnic groups of visitors (Caucasian and Hispanic) as DF 3. Ey, 


(a) Show that this is correct for a table with four rows and two columns. 


(b) SAS gives the chi-square statistic as Chi-Sq = 99.6058. Where 
does this value fall when compared with critical values of the chi- 
square distribution with three degrees of freedom in Table D? How 
does SAS’s result P-Value < 0.0001 compare with the P-value 
from the table? 


(c) The original table included four ethnic groups, an additional column 
for Asians/Pacific Islanders and a second additional column for African 
Americans, and the same categories for age. What is the proper 
degrees of freedom for this table? 


25.13 Video Gaming and Grades. The JMP output in Figure 25.4 gives two 
degrees of freedom for the table in Exercise 25.2. 


(a) Verify that this is correct. 

(b) The computer gives the value of the chi-square statistic as x” = 6.739. 
Between what two entries in Table D does this value lie? Verify that 
JMP’s P-value does fall between the tail probabilities p for these two 
entries. 


(c) What is the mean value of the statistic 7 if the null hypothesis is 
true? How does the observed value of 7’ compare with this mean? 
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The most common and most important use of the chi-square statistic is to test 
the hypothesis that there is no relationship between two categorical variables. A varia- 
tion of the statistic can be used to test a different kind of null hypothesis: that a 
categorical variable has a specified distribution. Here is an example that illustrates this 
use of chi-square. 


Births are not evenly distributed across the days of the week. Fewer babies are born 
on Saturday and Sunday than on other days, probably because doctors find weekend 
births inconvenient. 

A random sample of 140 births from local records shows this distribution across 
the days of the week: In 


DATA 


Day Sun. Mon. Tue. Wed. Thu. Fri. Sat. BIR? 
Births 13 23 24 20 27 18 15 


Sure enough, the two smallest counts of births are on Saturday and Sunday. Do these 
data give significant evidence that local births are not equally likely on all days of 
the week? = 


The chi-square test answers the question of Example 25.8 by comparing observed 
counts with expected counts under the null hypothesis. The null hypothesis for 
births says that they are evenly distributed. To state the hypotheses carefully, write 
the discrete probability distribution for days of birth: 


Day Sun. Mon. Tue. Wed. Thu. Fri. Sat. 
Probability p, bo D3 D4 Ds Do D7 


The null hypothesis says that the probabilities are the same on all days. In that case, 
all seven probabilities must be 1/7. So the null hypothesis is 


Ho: pi = p2 = D3 = py = ps = Po = py 7 


The alternative hypothesis says that days are not all equally probable: 


1 
H,: not all p; = 7 


As usual in chi-square tests, H, is a “many-sided” hypothesis that simply says that 
Ho is not true. The chi-square statistic is also as usual: 


rv=> 


The expected count for an outcome with probability p is np, as we saw in the 
discussion following Example 25.2. Under the null hypothesis, all the probabilities 
p; are the same, so all seven expected counts are equal to 


(observed count — expected count)’ 


expected count 


1 
np, = 140 x = = 20 


*This special topic is optional. 
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df = 6 
bp | 0.25 0.20 
x*| 7.84 8.56 


ny CHI-SQUARE IN 
it THE CASINO 


Gambling devices such as slot 
machines and roulette wheels are 
supposed to have a fixed and known 
distribution of outcomes. Here’s a job 
for the chi-square test of goodness 
of fit: state gambling regulators 

use it to verify that casino devices 
are honest. How much deviation a 
casino can get away with depends on 
the state. Nevada cracks down if chi- 
square is significant at the 5% level. 
Mississippi gives more leeway, acting 
only when the 1% level is reached. 


These expected counts easily satisfy our guidelines for using chi-square. The chi- 
square statistic is 


(observed count — 20)? 
2— 
v= 2 20 
(13 — 20)? (23 — 20) (15 — 20) 
+ + eee + —— Ee 
20 20 20 


This new use of x” requires a different degrees of freedom. To find the P-value, 
compare x? with critical values from the chi-square distribution with degrees 
of freedom one less than the number of values the birth day can take. That’s 
7 — 1 = 6 degrees of freedom. From Table D, we see that 7? = 7.6 is smaller than 
the smallest entry in the df = 6 row, which is the critical value for tail area 0.25. 

The P-value is therefore greater than 0.25 (software gives the more exact value 
P = 0.269). These 140 births don’t give convincing evidence that births are not 
equally likely on all days of the week. 

The chi-square test applied to the hypothesis that a categorical variable has a 
specified distribution is called the test for goodness of fit. The idea is that the test 
assesses whether the observed counts “fit” the distribution. The chi-square statistic 
is the same as for the two-way table test, but the expected counts and degrees of 
freedom are different. Here are the details. 


The Chi-Square Test for Goodness of Fit 


A categorical variable has k possible outcomes, with probabilities p,, po, p3,..- 5 De 
That is, p; is the probability of the ith outcome. We have n independent observations 
from this categorical variable. 


To test the null hypothesis that the probabilities have specified values 


Jalen = Pigh Da = Day ooo), = ibe 


find the expected count for the ith possible outcome as np;y and use the chi-square 
statistic 


(observed count — expected count)? 


x=> 


The sum is over all the possible outcomes. 


expected count 


The P-value is the area to the right of y’ under the density curve of the chi-square 
distribution with k — 1 degrees of freedom. 


In Example 25.8, the outcomes are days of the week, with k = 7. The null 
hypothesis says that the probability of a birth on the ith day is pi» = 1/7 for all days. 
We observe n = 140 births and count how many fall on each day. These are the 
counts used in the chi-square statistic. 


() LaunchPad Online Resources 


* The StatBoards video, The Chi-Square Test for Goodness of Fit, provides 
an application of the chi-square test to the problem of goodness of fit. 
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Apply Your Knowledge 


25.14 Saving Birds from Windows. Many birds are injured or killed by flying 
into windows. It appears that birds don’t see windows. Can tilting windows 42 
down so that they reflect earth rather than sky reduce bird strikes? Place six 
windows at the edge of a woods: two vertical, two tilted 20 degrees, and two 
tilted 40 degrees. During the next four months, there were 53 bird strikes: 
31 on the vertical windows, 14 on the 20-degree windows, and 8 on the 
40-degree windows.’ If the tilt has no effect, we expect strikes on windows 
with all three tilts to have equal probability. Test this null hypothesis. What 
do you conclude? 

25.15 More on Birth Days. Births really are not evenly distributed across the 
days of the week. The data in Example 25.8 failed to reject this null hy- 
pothesis because of random variation in a quite small number of births. 
Here are data on 700 births in the same locale: BIRTH140 


Day Sun. Mon. Tue. Wed. Thu. Fri. Sat. 
Births 84 110 124 104 94 We 7 


© Blend Images/Alamy 


(a) The null hypothesis is that all days are equally probable. What are 
the probabilities specified by this null hypothesis? What are the 
expected counts for each day in 700 births? 


(b) Calculate the chi-square statistic for goodness of fit. 


(c) What are the degrees of freedom for this statistic? Do these 700 births 
give significant evidence that births are not equally probable on all 
days of the week? 


25.16 Police Harassment? Police may use minor violations such as not 
wearing a seat belt to stop motorists for other reasons. A large study in 
Michigan first studied the population of drivers not wearing seat belts 
during daylight hours by observation at more than 400 locations around 
the state. Here is the population distribution of seat belt violators by age 
group:!° 


Age group 16to29 30to59 60 or older 
Proportion 0.328 0.594 0.078 


The researchers then looked at court records and called a random sample 
of 803 drivers who had actually been cited by police for not wearing a seat 
belt. Here are the counts: 


Age group 16 to 29 30 to 59 60 or older 
Count 401 382 20 


Does the age distribution of people cited differ significantly from the distri- 
bution of ages of all seat belt violators? Which age groups have the largest 
contributions to chi-square? Are these age groups cited more or less fre- 
quently than is justified? (The study found that males, blacks, and younger 
drivers were all overcited.) 


25.17 Order in Choice. Does the order in which wine is presented make a differ- 
ence? Several choices of wine are presented one at a time and in sequence, 42 
and the subject is then asked to choose their preferred wine at the end of 
the sequence. In this study, subjects were asked to taste four wine samples in 
sequence. All four samples given to a subject were the same wine, although 
subjects were expecting to taste four different samples of a particular variety.!! 
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There were 33 subjects in the study, and the positions in the sequence they 


selected for their preferred wine were 


Position 1 2 3 4 
Count ib 5 ae lale 


(a) What percent of the subjects chose each position? 


(b) If the subjects were equally likely to select each position, what are 


the expected counts for each position? 


(c) Does the chi-square test for goodness of fit give good evidence that 
the subjects were not equally likely to choose each position? (State 
hypotheses, check the guidelines for using chi-square, give the test 


statistic and its P-value, and state your conclusion.) 


(d) The primacy effect is a tendency for subjects to choose the first wine 
tasted, whereas the recency effect is a tendency for subjects to choose 
the most recent wine tasted. Are either of these effects present in this 


data? 


25.18 What’s Your Sign? For reasons known only to social scientists, the Gen- 
eral Social Survey (GSS) regularly asks its subjects their astrological sign. 


Here are the counts of responses for the 2012 GSS: SIGNS 


Sign Aries ‘Taurus Gemini Cancer Leo 
Count 145 162 163 148 186 
Sign Libra Scorpio Sagittarius Capricorn Aquarius 
Count 178 147 145 151 170 


If births are spread uniformly across the year, we expect all 12 signs to be 
equally likely. Are they? Follow the four-step process in your answer. 


CHAPTER 25 SUMMARY 


Chapter Specifics 


mu The chi-square test for a two-way table tests the null hypothesis Hy that there is 
no relationship between the row variable and the column variable. The alternative 
hypothesis H, says that there is some relationship but does not say what kind. When 
the two-way table results from classifying individuals in a single SRS according to two 
categorical variables, no relationship says the two categorical variables are indepen- 
dent, and the test is called the chi-square test of independence. If the two-way table 
is formed from independent SRSs from two or more populations with each individual 
classified according to one categorical variable, no relationship says the populations are 


the same and the test is called the chi-square test of homogeneity. 


m The test compares the observed counts of observations in the cells of the table with the 
counts that would be expected if Hy were true. The expected count in any cell is 


row total X column total 


expected count = 
7 table total 


m The chi-square statistic is 


x=> 


(observed count — expected count)’ 


expected count 


m The chi-square test compares the value of the statistic x’ with critical values from the 
chi-square distribution with (r — 1)(c — 1) degrees of freedom. Large values of x” 
are evidence against Ho, so the P-value is the area under the chi-square density curve 
to the right of x’. 


m The chi-square distribution is an approximation to the distribution of the statistic x’. 
You can safely use this approximation when all expected cell counts are at least 1 and 
no more than 20% are less than 5. 

m If the chi-square test finds a statistically significant relationship between the row 
and column variables in a two-way table, do data analysis to describe the nature of 
the relationship. You can do this by comparing well-chosen percents, comparing the 
observed counts with the expected counts, and looking for the largest terms of the 
chi-square statistic. 


STATISTICS IN SUMMARY 
Here are the most important skills you should have acquired from reading this chapter. 


A. Two-Way Tables 


1. Understand that the data for a chi-square test must be presented as a two-way table 
of counts of outcomes. 

2. Recognize that a two-way table can occur from a single SRS with each individual 
classified according to two categorical variables, or from independent SRSs from 
two or more populations with each individual classified according to one categorical 
variable. 

3. Use percents to describe the relationship between the row and column variables, 
starting from the counts in a two-way table. 


B. Interpreting Chi-Square Tests 
1. Locate the chi-square statistic, its P-value, and other useful facts (row or column per- 
cents, expected counts, terms of chi-square) in output from your software or calculator. 
2. Use the expected counts to check whether you can safely use the chi-square test. 


3. Explain what null hypothesis the chi-square statistic tests in a specific two-way 
table. Understand the difference between testing for independence or homogeneity 
in a two-way table. 


4. If the test is significant, compare percents, compare observed with expected cell 
counts, or look for the largest terms of the chi-square statistic to see what deviations 
from the null hypothesis are most important. 


C. Doing Chi-Square Tests by Hand 


1. Calculate the expected count for any cell from the observed counts in a two-way 
table. Check whether you can safely use the chi-square test. 

2. Calculate the term of the chi-square statistic for any cell, as well as the overall 
statistic. 

3. Give the degrees of freedom of a chi-square statistic. Make a quick assessment of the 
significance of the statistic by comparing the observed value with the degrees of freedom. 


4. Use the chi-square critical values in Table D to approximate the P-value of a 
chi-square test. 


Link It 


The final portion of the text studies relationships between variables. In this chapter, the case 
of two-way tables is considered and a formal test for answering the question “Is there a rela- 
tionship between the row and column variables?” is developed. As with procedures described 


Statistics in Summary 
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in earlier chapters, we must first consider how the data were produced, as this plays an im- 
portant role in the conclusions we can reach. In addition, we should begin with data analysis 
rather than a formal test. In the case of two-way tables, this typically involves looking at 
percents, both numerically and graphically, to first understand the nature of the relationship. 
When considering the relationship between the age of young adults and living arrangements 
in Example 25.1, we can see from our data analysis that as young adults age from 19 to 22, the 
percent living with their parents drops as the percent living in their own place rises. 

Even though there appears to be a clear relationship between age and living arrange- 
ments in Example 25.1, we must still answer the question of whether the observed differ- 
ences are large enough to be statistically significant. The chi-square test can be used for 
this, but it is an approximate procedure, and the conditions for cell sizes need to be checked 
before applying the test. Unlike some of the simpler procedures in earlier chapters, if the 
differences are statistically significant, then we have only determined from the chi-square 
test that there is evidence of a relationship, not the nature of the relationship. Although 
there are formal statistical procedures to further investigate the nature of the relationship, 
at this point we need to be satisfied describing the relationship between the two categorical 
variables using our data analysis tools. 


() LaunchPad Online Resources 
If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


° StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 


| CHECK YOUR SKILLS 


Resistance training is a pop- 
ular form of conditioning 
aimed at enhancing sports 
performance and is widely 
used among high school, 
college, and __ professional 
athletes, although its use for 
younger athletes is contro- 
versial. A random sample 
of 4111 patients aged 8-30 
admitted to U.S. emergency 
rooms with the injury code 
“weightlifting” were ob- 
tained. These injuries were 
classified as “accidental” if 
caused by dropped weight 
or improper equipment use. 
The patients were also clas- 


sified into the four age categories “8-13,” “14-18,” “19-22,” and 


“23-30.” Here is a two-way table of the results: 


25.19 


25.20 


25.21 


25.22 


CHAPTER 25 EXERCISES 


LIFTING 


ACCIDENTAL NOT ACCIDENTAL 
8-13 22D 102 

14-18 655 916 

19-22 239 53}3) 

23-30 363 1008 


The number of “accidental” injuries in the sample is 
(a) 1552. (b) 2559. (c) 4111. 


The percent of the 14- to 18-year-olds in the sam- 
ple whose injuries were classified as “accidental” is 
about 


(a) 42.2%. (b) 41.7%. (c) 74.3%. 


The percent of the 14- to 18-year-olds in the sample 
whose injuries were classified as “accidental” is 


(a) higher than the percent for the 23- to 30-year-olds. 
(b) about the same as the percent for the 23- to 
30-year-olds. 

(c) lower than the percent for the 23- to 30-year-olds. 
For this two-way table, we should do a chi-square 
test of 


(a) homogeneity. (b) independence. 


(c) goodness of fit. 


If you have access to software or a graphing calculator, use it to 
speed your analysis of the data in these exercises. Exercises 25.30 
to 25.35 are suitable for hand calculation if necessary. 


25.30 


Smoking cessation. A large randomized trial was 
conducted to assess the efficacy of Chantix® for 
smoking cessation compared with bupropion (more 


Corbis/Superstock 


25.23 


25.24 


25.25 


25.26 


25.27 


25.28 


25.29 


Exercises 
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The expected count of the 14- to 18-year-olds whose 
injuries were classified as “accidental” is about 


(a) 593.09. (b) 655. (c) 977.91. 


The term in the chi-square statistic for the cell of the 
14- to 18-year-olds whose injuries were classified as 
“accidental” is about 


(a) 593.09. (b) 3.919. (c) 6.463. 


The degrees of freedom for the chi-square test for 
this two-way table are 


(a) 3. (b) 4. (c) 8. 


The null hypothesis for the chi-square test for this 
two-way table is 


(a) the proportions of “accidental” and “not accidental” 
injuries are the same. 

(b) there is no difference in the conditional probability 
of an “accidental” injury for each of the four age groups. 
(c) “accidental” injuries are more likely for the 
younger age groups. 

The alternative hypothesis for the chi-square test for 
this two-way table is 


(a) the proportions of “accidental” and “not accidental” 
injuries are different. 

(b) the conditional probabilities of an “accidental” 
injury for each of the four age groups are not the same. 
(c) “accidental” injuries are more likely for the 
younger age groups. 

Software gives chi-square statistic vy? = 325.459 for 
this table. From the table of critical values, we can 
say that the P-value is 


(a) between 0.0025 and 0.001. 
(b) between 0.001 and 0.0005. 
(c) less than 0.0005. 


We can use our results to make inference about 
the relationship between age and type of weightlift- 
ing injury in the population of patients aged 18-30 
admitted to emergency rooms because 

(a) the sample is large, with 4111 weightlifting 
injuries in all. 

(b) the sample is close to an SRS of all weightlifting 
injuries for this age group. 

(c) there are only two types of weightlifting injury. 


commonly known as Wellbutrin® or Zyban®) and 
a placebo. Chantix® is different from most other 
quit-smoking products, in that it targets nicotine re- 
ceptors in the brain, attaches to them, and blocks 
nicotine from reaching them, whereas bupropion 
is an antidepressant often used to help people stop 
smoking. Generally healthy smokers who smoked at 
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least 10 cigarettes per day were assigned at random 
to take Chantix® (n = 352), bupropion (n = 329), 
or a placebo (n = 344). The study was double blind, 
with the response measure being continuous absence 
from smoking for weeks 9-12 of the study. Here is a 


two-way table of the results:'* jy) smoke 


TREATMENT 
CHANTIX® BUPROPION PLACEBO 
No smoking in weeks 9-12 155 97 61 
Smoked in weeks 9-12 197 U3) 283 


25.31 


(a) Give a 95% confidence interval for the 
difference between the proportions of smokers in 
the bupropion and placebo groups who did not 
smoke in weeks 9-12 of the study. 

(b) What proportion of each of the three groups 
in the sample did not smoke in weeks 9-12 of the 
study? Are there statistically significant differences 
among these proportions? State hypotheses and give 
a test statistic and its P-value. 

(c) In part (b), did you use a chi-square test for 
independence or homogeneity? Explain. 


Cell-only versus landline users. We suspect that 
people who rely entirely on cell phones will as a group 
be younger than those who have a landline telephone. 
Do data confirm this guess? The Pew Research Center 
interviewed separate random samples of cell-only and 
landline telephone users and broke down the samples 
by age group:"4 CELLAGE 


LANDLINE SAMPLE  CELL-ONLY SAMPLE 


Age 18-09 335 374 
Age 30-49 1242 347 
Age 50-64 1625 146 
Age 65 or older 1481 36 
Total 4683 903 
(a) Should you use a chi-square test of independence 
or homogeneity? 
(b) Do a complete analysis of these data, following 
the four-step process. 
25.32 College students need better sleep! A random 


sample of 871 students between the ages of 20 and 24 
at a large midwestern university completed a survey 
including questions about their sleep quality, moods, 
academic performance, physical health, and psycho- 
active drug use. Sleep quality was measured using the 
Pittsburgh Sleep Quality Index (PSQI), with students 
scoring less than or equal to 5 on the index classified 
as optimal sleepers, those scoring a 6 or 7 classified 
as borderline, and those scoring over 7 classified as 
poor sleepers. The following table looks at the rela- 
tionship between sleep quality classification and the 


use of over-the-counter (OTC) or prescription (Rx) 
stimulant medication more than once a month to help 
keep awake.” £ij7)) sLeEPQ 


USE OF OTC/Rx MEDS 
TO WAKE > 1X/MONTH 


SLEEP QUALITY ON PSQI INDEX 
OPTIMAL BORDERLINE POOR 


Yes 37 53 84 


No 266 186 245 


25.33 


25.34 


(a) Regarding this survey as approximately an SRS 
of college students, give a 95% confidence interval 
for the proportion of college students who use over- 
the-counter or prescription stimulant medication 
more than once a month to keep awake. 

(b) Find the conditional distributions of sleep 
quality for students who use over-the-counter or 
prescription stimulant medication more than once 
a month to keep awake and those who don’t. Make 
a graph that compares the two conditional distribu- 
tions. Use your work to describe the overall rela- 
tionship between students who use and don’t use 
medications to keep awake and their sleep quality. 
(c) Do students who use over-the-counter or 
prescription stimulant medication more than once 
a month to keep awake differ significantly in their 
quality of sleep from those who don’t? State hypoth- 
eses, give the chi-square statistic and its P-value, 
and state your conclusion. 


Did the randomization work? After randomly assign- 
ing subjects to treatments in a randomized compara- 
tive experiment, we can compare the treatment groups 
to see how well the randomization worked. We hope 
to find no significant differences among the groups. A 
study of how to provide premature infants with a sub- 
stance essential to their development assigned infants 
at random to receive one of four types of supplement, 
called PBM, NLCP, PL-LCP, and TG-LCP." 


(a) The subjects were 77 premature infants. Outline 
the design of the experiment if 20 are assigned to the 
PBM group and 19 to each of the other treatments. 
(b) The random assignment resulted in 9 females 

in the TG-LCP group and 11 females in each of the 
other groups. Make a two-way table of group by sex, 
and do a chi-square test to see if there are significant 
differences among the groups. What do you find? 


More on video gaming. The data for comparing two 
sample proportions can be presented in a two-way table 
containing the counts of successes and failures in both 
samples, with two rows and two columns. In Exercise 
25.2, a survey of the consequences of video gaming on 
14- to 18-year-olds is described. Here is another ques- 
tion from the survey. The question was about aggres- 
sive behavior as evidenced by getting into serious fights, 
and the comparison was between girls who have and 
have not played video games. Here are the data: 


25.35 


25.36 


SERIOUS FIGHTS 

YES NO 

Played games 36 50) 
Never played games 578 1436 


(a) Is there evidence that the proportions of all 14- 
to 18-year-old girls who played or have never played 
video games and have gotten into serious fights dif- 
fer? Find the two sample proportions, the x statistic, 
and its P-value. 

(b) Is there evidence that the proportions for 14- to 
18-year-old girls who have or have not gotten into 
serious fights differ between those who have played 
or have never played video games? Find the chi- 
square statistic y? and its P-value. 

(c) Show that (up to roundoff error) your x? is the 
same as x. The two P-values are also the same. These 
facts are always true, so you will often see chi-square for 
2 X 2 tables used to compare two proportions. 

(d) Suppose we were interested in finding out if 
the data gave good evidence that video gaming 

was associated with increased aggression in girls as 
evidenced by getting into serious fights. Can we 
use the z test for this hypothesis? What about the 
xX? test? What is the important difference between 
these two procedures? 


Unhappy rats and tumors. Some people think that 
the attitude of cancer patients can influence the 
progress of their disease. We can’t experiment with 
humans, but here is a rat experiment on this theme. 
Inject 60 rats with tumor cells, and then divide them 
at random into two groups of 30. All the rats re- 
ceive electric shocks, but rats in Group 1 can end 
the shock by pressing a lever. (Rats learn this sort of 
thing quickly.) The rats in Group 2 cannot control 
the shocks, which presumably makes them feel help- 
less and unhappy. We suspect that the rats in Group 1 
will develop fewer tumors. The results: 11 of the Group 
1 rats and 22 of the Group 2 rats developed tumors."” 


(a) Make a two-way table of tumors by group. 
State the null and alternative hypotheses for this 
investigation. 

(b) Although we have a two-way table, the chi- 
square test can’t test a one-sided alternative. Carry 
out the x test and report your conclusion. 


I think I'll be rich by age 30. A sample survey asked 
young adults (aged 19-25), “What do you think 
are the chances you will have much more than a 
middle-class income at age 30?” The JMP output in 
Figure 25.8 shows the two-way table and related infor- 
mation, omitting a few subjects who refused to respond 
or who said they were already rich.'$ RICHBY30 


(a) Should you use a chi-square test of independence 
or homogeneity? 


Exercises 603 
A\¥) Contingency Table 
Sex 
Count Female Male 
Col% 
Expected 
Carimchas2 
AlmostNo 96 98} 194 
4.06 3.88) 
95.1508 |98.8492 
OR OO OZ O07: 
Some 426 286| 712 
18.00 6S: 


249) 21336257387 


fe 16.8842 |16.2525 
& |5050 696 720/1416 
5 29.40| 29.28 
694.503 |721.497 
0.0032) 0.0031 
Good 663 758 |1421 
28.01| 30.83 
696.955 |724.045 
1.6543) 1.5924 
NearCer 486 yey 7/ || aL eles) 
20.53| 24.28 
531.177 |551.823 
3.8424) 3.6986 
2367 2459 |4826 
A| Tests 
N DF -LogLike RSquare (U) 
4826 4 22.067196 0.0066 
A| Test ChiSquare Prob>ChiSq 
Pearson 43.946 <.0001* 


FIGURE 25.8 
JMP output for the sample survey responses of Exercise 25.36. 


25.37 


(b) Use the output as the basis for a discussion of 
the relationship between sex and a person’s assess- 
ment of their chances of being rich by age 30. 


Sexy magazine ads? Look at full-page ads in maga- 
zines with a young adult readership. Classify ads that 
show a model as “not sexual” or “sexual” depending 
on how the model is dressed (or not dressed). Here are 
data on 1509 ads in magazines aimed at young men, at 
young women, or at young adults in general: 


READERS 
MEN WOMEN GENERAL 
Sexual 105 225 66 
Not sexual | 514 351 248 


Figure 25.9 (page 604) displays Minitab chi-square 
output. Use the information in the output to de- 
scribe the relationship between the target audience 
and the sexual content of ads in magazines for young 


adults, Sy) sexvans 


Mistakes in using the chi-square test are unusually common. 
Exercises 25.38 to 25.41 illustrate several kinds of mistakes. 
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m Session 


Men Women General All 
Sexual 105 225 66 396 
16.96 39.06 24302 26.24 
162.4 P52 82.4 396.0 
20.312 36.074 3.265 * 
Not sexual 514 Sieyil 248 a kes} 
83.04 60.94 78.98 Wietreghs 
456.6 424.8 23:13:6 L130 
hia 2ou 12335 6.2 * 
All 619 56 314 1509 
100.00 100.00 100.00 100.00 
619.0 576.0 314.0 1509.0 
* * * * 
Cell Contents: Count 


% of Column 
Expected count 
Contribution to Chi-square 


Pearson Chi—Square = 


co) 


FIGURE 25.9 
Minitab output for a study of ads in 
magazines, Exercise 25.37. 


25.38 Sorry, no chi-square. An experimenter hid a toy 
from a dog behind either Screen A or Screen B. In the 
first phase the toy was always hidden behind Screen A, 
whereas in the second phase the toy was always hid- 
den behind Screen B. Will the dog continue to look 
behind Screen A in the second phase? This was tried 
under three conditions. In the Social-Communica- 
tive condition, the experimenter communicated with 
the dog by establishing eye contact and addressing the 


80.874, 


DF = 2, P-Value = 0.00 
7” 


A 


(a) The data do show a difference in the number 

of errors for the different conditions. Show this by 
comparing suitable percents. 

(b) The researchers used a more complicated but 
exact procedure rather than chi-square to assess sig- 
nificance for these data. Why can’t the chi-square 
test be trusted in this case? 

(c) If you use software, does the chi-square output 
for these data warn you against using the test? 


dog while hiding the Hoy, 1 the Noncommunicative 25.39 Sorry, no chi-square. How do U.S. residents who 
condition, the toy wae hidden without communica- travel overseas for leisure differ from those who travel 
tion; in the Nonsocial condition, the toy was dragged for business? Here is the breakdown by occupation:?! 
by a string to be hidden without any interaction from 
the experimenter. There were 12 dogs assigned at ran- LEISURE BUSINESS 
dom to each condition, and the dog had up to three TRAVELERS TRAVELERS 
trials to find the toy hidden behind Screen B in phase Professional/technical 36% 39% 
: oe sere - - dog ee to search Maiwded Scartne 23% 48% 
ehind Screen A, with the number of errors ranging 
Retired 14% 3% 
from zero if the dog found the toy behind Screen B on uate ae ~ 
ee : . Student 7% 3% 
the initial trial up to three if the dog never correctly 
chose Screen B. Here are the data:’° HIDETOY Other 20% ™% 
Total 100% 100% 
NUMBER OF ERRORS 
0 4 2 83 Explain why we don’t have enough information to 
Ran erred | ee Se ae ee ones test . es whether these two 
gaan ae 7 0s GS istributions differ significantly. 
Nao en 25.40 Sorry, no chi-square. Socially isolated individu- 


als have been shown to be at greater risk for the 


development of a variety of illnesses and increased 
mortality. It is a more serious problem among the 
elderly as decreasing economic resources, mobility 
impairment, and death of contemporaries combine 
to limit social contact. In a study of 6500 men and 
women in the English Longitudinal Study of Aging, 
individuals were classified according to social iso- 
lation, with 5269 classified as low/average and the 
remaining 1231 classified as high social isolation. 
The following table shows the incidence of several 
limiting long-standing illnesses for the two groups.” 
Explain why it is not correct to use a chi-square test 
on this table to compare the “low/average isolation” 
and “high isolation” groups. Note that to use the 
chi-square test in a two-way table, each individual 
must fall into one cell of the table. 


LOW/AVERAGE HIGH 
ISOLATION ISOLATION 

Cancer 168 36 
Diabetes 332 89 
Coronary heart disease 193 43 
Chronic lung disease 65 34 
Stroke 134 43 
Arthritis 1778 477 
Mobility impairment 2940 769 
Depression 960 359 


25.41 Sorry, no chi-square. Does eating chocolate trigger 
headaches? To find out, women with chronic head- 
aches followed the same diet except for eating choc- 
olate bars and carob bars that looked and tasted the 
same. Each subject ate both chocolate and carob bars 
in random order with at least three days between. Each 
woman then reported whether or not she had a head- 
ache within 12 hours of eating the bar. Here is a two- 
way table of the results for the 64 subjects:” 


NO HEADACHE HEADACHE 
Chocolate 53} 11 
Carob (placebo) 38 26 


The researchers carried out a chi-square test on this 
table to see if the two types of bar differ in triggering 
headaches. Explain why this test is incorrect. (Hint: 
There are 64 subjects. How many observations 
appear in the two-way table?) 
The remaining exercises concern larger tables that require software 
for easy analysis. In many cases, you should follow the Plan, Solve, 
and Conclude steps of the four-step process in your answers. 


25.42 Smokers rate their health. The University of 
Michigan Health and Retirement Study (HRS) sur- 
veys more than 22,000 Americans over the age of 


Fotosearch/Superstock 
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50 every two years. A subsample of the Health and 
Retirement Study (HRS) participated in the 2009 
Internet-based survey that collected information on 
a number of topical areas, including health (physical 
and mental, health behaviors); psychosocial items; 
economics (income, assets, expectations, and con- 
sumption); and retirement. Two of the questions 
asked on the Internet survey were, “Would you say 
your health is excellent, very good, good, fair, or 
poor?” and “Do you smoke cigarettes now?” The 
two-way table summarizes the answers on these two 
questions. EJ),| 


CURRENT SMOKER 
YES NO 
Excellent 25 484 
Very good 115 Soi 
Good 145 1309 
Fair 90 545 
Poor 29 11 


(a) Regarding the HRS Internet survey as approxi- 
mately an SRS of Americans over the age of 50, 
give a 99% confidence interval for the proportion 
of Americans over the age of 50 who are current 
smokers. 

(b) Compare the conditional distributions of 
self-evaluation of health for current smokers and 
nonsmokers using both a table and a graph. What 
are the most important differences? 

(c) Carry out the chi-square test for the hypothesis of 
no difference between the self-evaluation of health 
for current smokers and nonsmokers. What would 
be the mean of the test statistic if the null hypothesis 
were true? The value of the statistic is so far above 
this mean that you can see at once that it must be 
highly significant. What is the approximate P-value? 
(d) Look at the terms of the chi-square statistic 
and compare observed and expected counts in 
the cells that contribute the most to chi-square. 
Based on this and your findings in part (b), write 
a short comparison of the differences in self- 
evaluation of health for current smokers and 
nonsmokers. 


25.43 Who goes to religious services? The General So- 


cial Survey (GSS) asked 
this question: “Have you 
attended religious services 
in the last week?” Here 
are the responses for those 
whose highest degree was 
high school or above: 


7 ln SERVICES 
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HIGHEST DEGREE HELD 
HIGH JUNIOR 
SCHOOL COLLEGE BACHELOR GRADUATE 
Attended services 400 62 146 76 
Did not attend services| 880 101 232 105 


25.44 


(a) Carry out the chi-square test for the hypoth- 
esis of no relationship between the highest degree 
attained and attendance at religious services in the 
last week. What do you conclude? 

(b) Make a 2 X 3 table by omitting the column 
corresponding to those whose highest degree was 
high school. Carry out the chi-square test for the 
hypothesis of no relationship between the type 

of advanced degree attained and attendance at 
religious services in the last week. What do you 
conclude? 

(c) Make a 2 X 2 table by combining the counts 
in the three columns that have a highest degree 
beyond high school, so that you are compar- 

ing adults whose highest degree was high school 
against those whose highest degree was beyond 
high school. Carry out the chi-square test for the 
hypothesis of no relationship between attaining 

a degree beyond high school and attendance at 
religious services for this 2 X 2 table. What do 
you conclude? 

(d) Using the results from these three chi-square 
tests, write a short report explaining the relation- 
ship between attendance at religious services in the 
last week and the highest degree attained. As part 
of your report, you should give the percentages who 
attended religious services for each of the four high- 
est degrees. 


Condom usage among high school students. The 
Centers for Disease Control developed the Youth 
Risk Behavior Surveillance System (YRBSS) to 
monitor six categories of priority health-risk be- 
haviors among youth—behaviors that contribute 
to unintentional injuries and violence; tobacco use; 
alcohol and other drug use; sexual behaviors that 
contribute to unintended pregnancy and sexually 
transmitted diseases; unhealthy dietary behaviors; 
and physical inactivity. A multistage sample design 
is used to produce representative samples of stu- 
dents in grades 9-12, who then fill out a question- 
naire on these behaviors. The following data are 
for the question, “Did Not Use A Condom Dur- 
ing Last Sexual Intercourse?” The two-way table 
of grade and condom usage only included students 
who were currently sexually active. Here are the 
results:” f))| conpoms 
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CONDOM USED 
YES NO 
9th grade 248 417 
10th grade | 341 549 
11th grade | 467 772 
12th grade | 815 920 


Describe the most important differences between 
condom usage and grade. Is there a significant over- 
all difference between the proportion who used con- 
doms in the different grades? 


How are schools doing? The nonprofit group Public 
Agenda conducted telephone interviews with a strati- 
fied sample of parents of high school children. There 
were 202 black parents, 202 Hispanic parents, and 
201 white parents. One question asked was, “Are the 
high schools in your state doing an excellent, good, 
fair, or poor job, or don’t you know enough to say?” 
Here are the survey results:”° £j)) HIGHSCHL 


BLACK HISPANIC WHITE 
PARENTS PARENTS PARENTS 
Excellent 12; 34 yy 
Good 69 5) 81 
Fair i) 61 60 
Poor 24 24 24 
Don’t know TD, 28 14 
Total 202 202 201 


Are the differences in the distributions of responses 
for the three groups of parents statistically signifi- 
cant? What departures from the null hypothesis “no 
relationship between group and response” contribute 
most to the value of the chi-square statistic? Write a 
brief conclusion based on your analysis. 


Complications of bariatric surgery. Bariatric sur- 
gery, or weight-loss surgery, includes a variety of pro- 
cedures performed on people who are obese. Weight 
loss is achieved by reducing the size of the stomach 
with an implanted medical device (gastric banding), 
through removal of a portion of the stomach (sleeve 
gastrectomy), or by resecting and rerouting the small 
intestines to a small stomach pouch (gastric bypass 
surgery). Because there can be complications us- 
ing any of these methods, the National Institute 
of Health recommends bariatric surgery for obese 
people with a body mass index (BMI) of at least 40, 
and for people with BMI 35 and serious coexisting 
medical conditions such as diabetes. Serious compli- 
cations include potentially life-threatening, perma- 
nently disabling, and fatal outcomes. Here is a two-way 


table for data collected in Michigan over several 
years giving counts of non-life-threatening compli- 
cations, serious complications, and no complications 
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Support for political parties. Political parties want to know 
what groups of people support them. The General Social Survey 
(GSS) asked its 2012 sample, “Generally speaking, do you usu- 


for these three types of surgeries:”’ 


ally think of yourself as a Republican, Democrat, Independent, 
or what?” The GSS is essentially an SRS of American adults. 


the historic Cannery Row district of Monterey, Cali- 
fornia. Each year, the aquarium interviews a random 
sample of visitors as they exit the museum. The sur- 
vey includes visitor demographic information, use of 
social media, and opinions on their aquarium visit. 
For each visitor sampled during the summer months 
of June, July, and August, here is the distribution of 
the number of children in their group in each year 
during 2009-13: Bi) 


NUMBER OF CHILDREN | 2009 2012 2013 
0 390 318 268 
1 79 99 131 
2 66 UL 137 
3 or more 41 55 70 


How has the distribution of the number of children 
in the group changed over this five year period? Are 
the differences significant? One of the Aquarium’s 
strategic goals is to attract a younger audience, in- 
cluding more families and intergenerational groups. 
Does it appear that they are meeting their goal? 
Explain. 


TYPE OF COMPLICATION Here is a large two-way table breaking down the responses by 
NON-LIFE- the highest degree the subject held: 
THREATENING SERIOUS NONE | TOTAL 
Gastric banding 81 46 5253 | 5380 HIGH JR. 
Sleeve gastrectomy 31 19 804 854 NONE SCHOOL COLLEGE BACHELOR GRADUATE 
Gastric bypass 606 325 8110 | 9041 Strong Democrat} 47 161 35 60 3) 
Not strong 
(a) Is this study an experiment? Explain your Democrat 43 176 23 63 38 
answer. Independent, 
(b) Is there a significant difference in the distribu- near Democrat St Ute 10 43 28 
tion of type of complication for the three types of Independent 101 =: 186 26 36 24 
surgery? Which surgeries have the greatest chance Independent, 
of complications? Can we conclude that the surgery near Republican | 20 76 16 29 16 
is more dangerous, or could there be other factors Not strong 
associated with the increased risk? Republican 18 130 16 64 D2 
25.47 Bring your kids to the Monterey Bay Aquarium! Strong 
42 The Monterey Bay Aquarium, founded in 1984, is Republican 14-101 19 43 15 
situated on the beautiful coast of Monterey Bay in Other party 5 22 5 14 8 


Exercises 25.48 to 25.50 are based on this table. 


25.48 


25.49 


25.50 


Other parties. Give a 95% confidence interval for 
the proportion of adults who are “Independent.” 


Party support in brief. Make a2 5 table by com- 
bining the counts in the three rows that mention 
Democrat and in the three rows that mention Re- 
publican and ignoring strict independents and sup- 
porters of other parties. We might think of this table 
as comparing all adults who lean Democrat and all 
adults who lean Republican. How does support for 
the two major parties differ among adults with differ- 
ent levels of education? £jj)) comBrart 


Party support in full. Use the full table to analyze 
the differences in political party support among 
levels of education. The sample is so large that 
the differences are bound to be highly significant, 
but give the chi-square statistic and its P-value 
nonetheless. The main challenge is in seeing what 
the data say. Does the full table yield any insights 
not found in the compressed table you analyzed in 
Exercise 25.49? FULLPART 
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25.52 


Exploring the Web 


Make your own table. The Behavioral Risk Factor Surveillance System 
(BRFSS) is an ongoing data collection program designed to measure behavioral 
risk factors for the adult population (18 years of age or older) living in households. 
Data are collected from a random sample of adults (one per household) through 
a telephone survey. Go to the website apps.nccd.cdc.gov/BRFSS/ and 
under BRFSS contents click on Web Enable Analysis Tool (WEAT) and then click 
on Cross Tabulation Analysis. After a year is selected, a window will open that will 
allow you to produce two-way tables. 


(a) Choose a state of interest to you and two variables for the two-way table. 
For example, you could choose Connecticut and look at the relationship between 
a demographic variable such as education level and a variable such as health care 
coverage. Once you have chosen your state and two variables, click on run report 
on the bottom of the page. A two-way table will appear in a new window. 


(b) Is there a relationship between the two variables you have selected? If the 
relationship is statistically significant, describe the relationship in a brief report 
using percentages from the table and an appropriate graph. 


What do the voters think? The American National Election Studies (ANES) 
is the leading academically run national survey of voters in the United States, 
conducted before and after every presidential election. SDA (Survey Documen- 
tation and Analysis) is a set of programs that allows you to analyze survey data 
and includes the ANES survey as part of its archive. Go to the website sda. 
berkeley.edu/ and click on Archive. Go to the 2012 ANES survey. 


(a) Open the pre-election questions. Under Liberal/Conservative, choose the 
variable “liberal/conservative self-placement on a 7 point scale.” Use this as your 
row variable. Under Energy and Climate Change, choose the variable “Anthro- 
pogenic climate change,” or climate change caused or produced by humans. Use 
this as your column variable. In the details for the table, set Weight to none and for 
N of Cases to Display, make sure the unweighted box is checked. For Percentaging, 
choose row percents. Now click on “run the table.” 


(b) To analyze the data, make a 3 X 3 table by combining the rows for 
extremely liberal and liberal, slightly liberal, middle of the road and slightly con- 
servative, and conservative and extremely conservative. Carry out a formal test 
to determine if there is a relationship between these two variables, and then 
describe the relationship in a brief report using percentages from the table or an 
appropriate graph. 

(c) Select two other variables of interest to you, and analyze the relationship 
between these two variables. If there is a more recent survey than 2012, you 
should use it. 
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Inference for Regression 


hen a scatterplot shows a linear relationship between a quantitative ex- 

planatory variable x and a quantitative response variable y, we can use 

the least-squares line fitted to the data to predict y for a given value of x. 
When the data are a sample from a larger population, we need statistical inference 
to answer questions like these about the population: 


Is there really a linear relationship between x and y in the population, or 
might the pattern we see in the scatterplot plausibly arise just by chance? 


What is the slope (rate of change) that relates y to x in the population, 
including a margin of error for our estimate of the slope? 


If we use the least-squares line to predict y for a given value of x, how 
accurate is our prediction (again, with a margin of error)? 


This chapter shows you how to answer these questions. Here is an example we will 
explore. 


EXAMPLE 26.1 Crying and IQ 


Infants who cry easily may be more easily stimulated than others. This may 
be a sign of higher IQ, Child development researchers explored the relationship be- 
tween the crying of infants four to ten days old and their later |Q test scores. A snap 
of a rubber band on the sole of the foot caused the infants to cry. The researchers 
recorded the crying and measured its intensity by the number of peaks in the most 
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In this chapter 
we cover... 


26.1 Conditions for regression 
inference 


26.2 Estimating the parameters 
26.3 Using technology 


26.4 Testing the hypothesis of no 
linear relationship 


26.5 Testing lack of correlation 


26.6 Confidence intervals for the 
regression slope 


26.7 Inference about prediction 


26.8 Checking the conditions 
for inference 


clu) 
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scatterplot 


FIGURE 26.1 

Scatterplot of the IQ score of infants 
at age three years against the inten- 
sity of their crying soon after birth, 
with the least-squares regression 
line, for Example 26.1. 


active 20 seconds. They later measured the children’s IQ at age three years using the 
Stanford-Binet IQ test. Table 26.1 contains data on 38 infants.'! Do children with 
higher crying counts tend to have higher IQs? 


CRYING 1Q CRYING 1Q CRYING 1Q CRYING 1Q 
10 87 20 90 17 94 12 94 
2 97 16 100 19 103 12 103 

9 103 23 103 13 104 14 106 
16 106 Ei 108 18 109 10 109 
18 109 15 112 18 112 23 113 
15 114 Zl 114 16 118 9 119 
12 119 12 120 19 120 16 124 
20 132 15 1333 ey 135 31 1335) 
16 136 17 141 30 155 22 157 
33 159 13 162 


)PLAN: Make a scatterplot. If the relationship appears linear, use correlation and 
regression to describe it. Finally, ask whether there is a statistically significant linear 
relationship between crying and IQ. 


BOR ERGs eH Chapters 4 and 5 introduced the data analysis that must 
come before inference. The first steps we take are a review of this data analysis. 
Figure 26.1 is a scatterplot of the crying data. Plot the explanatory variable (count 
of crying peaks) horizontally and the response variable (IQ) vertically. Look for the 
form, direction, and strength of the relationship as well as for outliers or other deviations. 
There is a moderately strong positive linear relationship, with no extreme outliers or 
potentially influential observations. 
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Because the scatterplot shows a roughly linear (straight-line) pattern, the correlation correlation 
describes the direction and strength of the relationship. The correlation between crying 

and IQ is r= 0.455. We are interested in predicting the response from information 

about the explanatory variable. So we find the least-squares regression line for least-squares line 
predicting IQ from crying. The equation of the regression line is 


A 


y=a-+ bx 
91.27 + 1.493x 


Children who cry more vigorously do tend to have higher 
IQs. Because r? = 0.207, only about 21% of the variation in IQ scores is explained 
by crying intensity. Prediction of IQ will not be very accurate. It is nonetheless 
impressive that behavior soon after birth can even partly predict IQ three years later. 
Is this observed relationship statistically significant? We must now develop tools for 
inference in the regression setting. 


We can fit a regression line to any data relating two quantitative variables, 
though the results are useful only if the scatterplot shows a linear pattern. Statistical 
inference requires more detailed conditions. Because the conclusions of inference 
always concern some population, the conditions describe the population and how 
the data are produced from it. The slope b and intercept a of the least-squares line 
are statistics. That is, we calculated them from the sample data. These statistics 
would take somewhat different values if we repeated the study with different infants. 
To do inference, think of a and b as estimates of unknown parameters that describe 
the population of all infants. 


Conditions for Regression Inference 


We have n observations on an explanatory variable x and a response variable y. Our 
goal is to study or predict the behavior of y for given values of x. 


» For any fixed value of x, the response y varies according to a Normal distribution. 
Repeated responses y are independent of each other. 


al 


The mean response 1, has a straight-line relationship with x given by a 
population regression line 


[itn = GF ae [oR 


© The slope B and intercept a are unknown parameters. 


a The standard deviation of y (call it v7) is the same for all values of x. The 
value of o is unknown. 


There are thus three population parameters that we must estimate from the data: a, 


B, and a. 


These conditions say that in the population there is an “on the average” 
straight-line relationship between y and x. The population regression line 
fy = a + Bx says that the mean response 4, moves along a straight line as the 
explanatory variable x changes. We can’t observe the population regression line. 
The values of y that we do observe vary about their means according to a Normal 
distribution. If we hold x fixed and take many observations on y, the Normal pat- 
tern will eventually appear in a stemplot or histogram. In practice, we observe y 
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for many different values of x, so that we see an overall linear pattern formed by 
points scattered about the population line. The standard deviation o determines 
whether the points fall close to the population regression line (small ~) or are 
widely scattered (large o). 

Figure 26.2 shows the conditions for regression inference in picture form. The 
line in the figure is the population regression line. The mean of the response y 
moves along this line as the explanatory variable x takes different values. The 
Normal curves show how y will vary when x is held fixed at different values. All the 
curves have the same a, so the variability of y is the same for all values of x. You 
should check the conditions for inference when you do inference about regression. 
We will see later how to do that. 


For any fixed x, the responses y 


follow a Normal distribution 
with standard deviation o. 


FIGURE 26.2 

The nature of regression data when the 

conditions for inference are met. The 

line is the population regression line, 

which shows how the mean response 

4, Changes as the explanatory variable 

x changes. For any fixed value of x, the Byes Os Bx 
observed response y varies according to 

a Normal distribution having mean pw, 

and standard deviation o. 


The first step in inference is to estimate the unknown parameters a, B, and o. 


Estimating the Population Regression Line 


When the conditions for regression are met and we calculate the least-squares line 
y =a + bx, the slope b of the least-squares line is an unbiased estimator of the popu- 
lation slope B, and the intercept a of the least-squares line is an unbiased estimator 
of the population intercept a. 


The data in Figure 26.1 satisfy the condition of scatter about an invisible population 
regression line reasonably well. The least-squares line is y = 91.27 + 1.493x. The 
slope is particularly important. A slope is a rate of change. The population slope B 
says how much higher average IQ is for children with one more peak in their crying 
measurement. Because b = 1.493 estimates the unknown B, we estimate that, on 
the average, IQ is about 1.5 points higher for each added crying peak. 

We need the intercept a = 91.27 to draw the line, but it has no statistical meaning 
in this example. No child had fewer than nine crying peaks, so we have no data near 
x = O. We suspect that all typical children would cry when snapped with a rubber 
band, so that we will never observe x = O. m 


26.2 Estimating the Parameters 


The remaining parameter is the standard deviation o, which describes the vari- 
ability of the response y about the population regression line. The least-squares line 
estimates the population regression line. So the residuals estimate how much y var- 
ies about the population line. Recall that the residuals are the vertical deviations of 
the data points from the least-squares line: 


residual = observed y — predicted y 
p= 3 
There are n residuals, one for each data point. Because o is the standard devia- 
tion of responses about the population regression line, we estimate it by a sample 
standard deviation of the residuals. We call this sample standard deviation the 
regression standard error to emphasize that it is estimated from data. The residuals 


from a least-squares line always have mean zero. That simplifies their standard 
error. 


Regression Standard Error 


The regression standard error is 


= L 5 Zz) 

seas y) residual 
- (5 D0-5 
TON eee) ie 


Use s to estimate the standard deviation o@ of responses about the mean given by the 
population regression line. 


Because we use the regression standard error so often, we just call it s. The 
quantity =(y — y)* is the sum of the squared deviations of the data points from 
the line. We average the squared deviations by dividing by n — 2, the number of 
data points less 2. It turns out that if we know n — 2 of the n residuals, the other 
two are determined. That is, n — 2 are the degrees of freedom of s. We first met 
the idea of degrees of freedom in the case of the ordinary sample standard devia- 
tion of n observations, which has n — 1 degrees of freedom. Now we observe two 
variables rather than one, and the proper degrees of freedom are n — 2 rather 
than n — 1. 

Calculating s is unpleasant. You must find the predicted response for each x in 
your data set, then the residuals, and then s. In practice you will use software that 
does this arithmetic instantly. Nonetheless, here is an example to help you under- 
stand the standard error s. 
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residuals 


THE JINX! 
TUL Athletes are often jinxed. 
Ht WL We read of “the rookie 
of the year jinx,” the 
“cover of Sports Illustrated jinx,” and 
many others. That is, athletes who 
are recognized for an outstanding 
performance often fail to do as 
well in the future. No, nature 
isn’t retaliating against them. It’s 
just random variation about their 
long-term mean performance. 
They were recognized because they 
randomly varied above their typical 
performance, and in the future they 
return to the mean or randomly 
vary down from it. If they randomly 
vary down, they can hope for a 
“comeback” award the next year. 


degrees of freedom 


Table 26.1 shows that the first infant studied had 10 crying peaks and a later IQ of 
87. The predicted IQ for x = 10 is 


y = 91.27 + 1.493x 
= 91.27 + 1.493(10) = 106.2 


The residual for this observation is 


residual = y — y 
= 87 — 106.2 = —-19.2 


CRYINGR 
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That is, the observed IQ for this infant lies 19.2 points below the least-squares line 
on the scatterplot. 

Repeat this calculation 37 more times, once for each subject. The 38 residuals 
are 


—19:20 =31.13. =22.65. =1518 =12.18 =15.15 =16.63 =6.18 
—1.70 —22.60 -668  —-617 —-9.15 -—23.58  —-9.14 2.80 
-914 -166  —-6.14 -—12.60 0.34 —8.62 2.85 14.30 

9.82 10.82 0.37 8.85 10.87 19.34 10.89 —2.55 
20.85 24.35 18.94 32.89 18.47 51.32 


Check the calculations by verifying that the sum of the residuals is zero. It is 0.04, 
not quite zero, because of roundoff error. Another reason to use software in regres- 
sion is that roundoff errors in hand calculation can accumulate to make the results 
inaccurate. 
The variance about the line is 
1 


= 5 » residual? 
A= 


1 2 — 2 Rien 2 
2 38 2g l(-19.20) + (—31.13)? +--+ + (51.32)| 


‘ 
= =~(11,023.3) = 306.20 
36 | ) 


Finally, the regression standard error is 


s = V 306.20 = 17.50 m 


We will study several kinds of inference in the regression setting. The re- 
gression standard error s is the key measure of the variability of the responses 
in regression. It is part of the standard error of all the statistics we will use for 
inference. 


(4) LaunchPad Online Resources 


* The StatBoards video, Fitting the Least-Squares Regression Line, 
discusses fitting the least-squares regression line, including residuals 
and standard errors in an example. 

* The StatClips Examples video, Regression: Assumptions Example B, 
examines the residuals in an example to assess the regression 
assumptions. 


Apply Your Knowledge 


© Glowimages RM/age fotostock 


26.1 Wine and Cancer in Women. Some studies have suggested that a nightly 
glass of wine may not only take the edge off a day but also improve health. Is 
wine good for your health? A study of nearly 1.3 million middle-aged British 
women examined wine consumption and the relative risk of breast cancer. 
The relative risk is the proportion of those in the study who drank a given 
amount of wine and who developed breast cancer divided by the proportion 
of nondrinkers in the study who developed breast cancer. For example, if 10% 
of the women in the study who drank 10 grams of wine developed breast can- 
cer and 9% of nondrinkers in the study developed breast cancer, the relative 


risk of breast cancer for women drinking 10 grams of wine per day would be 
10%/9% = 1.11. A relative risk greater than 1 indicates a greater propor- 
tion of drinkers in the study developed breast cancer than nondrinkers. Wine 
intake is the mean wine intake, in grams per day, of all women in the study 
who drank some wine but less than or equal to two drinks per week; who 
drank between three and six drinks per week; who drank between seven and 
14 drinks per week; and who drank 15 or more drinks per week. Here are the 
data (for drinkers only):* Ej.) cancer 


Wine intake (grams per day) (x) 2.5 85 15.5 26.5 
Relative risk (y) LOO Noes Nts iLvZ 


(a) Examine the data. Make a scatterplot with wine intake as the 
explanatory variable, and find the correlation. There is a strong 
linear relationship. 

(b) Explain in words what the slope B of the population regression line 
would tell us if we knew it (note that these data represent averages 
over large numbers of women and are an example of an ecological 
correlation (see page 146), and one must be careful not to interpret 
the data as applying to individuals). Based on the data, what are 
the estimates of B and the intercept a of the population regression 
line? 

(c) Calculate by hand the residuals for the four data points. Check that 
their sum is 0 (up to roundoff error). Use the residuals to estimate the 
standard deviation o that measures variation in the responses (rela- 
tive risk) about the means given by the population regression line. You 
have now estimated all three parameters. 


Basic “two-variable statistics” calculators will find the slope b and intercept a 
of the least-squares line from keyed-in data. Inference about regression requires in 
addition the regression standard error s. At this point, software or a graphing cal- 
culator that includes procedures for regression inference becomes almost essential 
for practical work. 

Figure 26.3 shows regression output for the data of Table 26.1 from a graph- 
ing calculator, three statistical programs, and a spreadsheet program. When 
we entered the data into the programs, we called the explanatory variable 
“Crycount.” The software outputs use that label. The graphing calculator just 
uses “x” and “y” to label the explanatory and response variables. You can locate 
the basic information in all the outputs. The regression slope is b = 1.4929, 
and the regression intercept is a = 91.268. The equation of the least-squares 
line is therefore (after rounding) just as given in Example 26.1. The regression 
standard error is s = 17.4987, and the squared correlation is r? = 0.207. Both 
of these results reflect the rather wide scatter of the points in Figure 26.1 about 
the least-squares line. 

Each output contains other information, some of which we will need shortly and 
some of which we don’t need. In fact, we left out some output to save space. Once 
you know what to look for, you can find what you want in almost any output and 
ignore what doesn’t interest you. 


26.3 Using Technology 
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FIGURE 26.3 Texas Instruments Graphing Calculator 
Regression of IQ on crying peaks: : : 

output from a graphing calculator, three LinkegsTTest LinkedTTest 
statistical programs, and a spreadsheet cere pte eee pte 
ae L=3,.865 Th=1.4929 


5=17,4987 


=) 
GBH re=, 2078 
83 r=, 4558 


V | Linear Fit 
TQ = 91.268299 + 1.4928966*Crycount 


i Summary of Fit 
RSquare 0.207 
RSquare Adj 0.184972 
Root Mean Square Error 17 .49872 
Mean of Response 117.2368 
Observations (or Sum Wgts) 38 


>| Lack of Fit 
y Analysis of Variance 


V/ Parameter Estimates 
Term Estimate Std Error t Ratio Prob > |t| 
Intercept 91.268299 8.934215 10.22 <.0001* 
Crycount 1.4928966 0.487001 3.07 0.0041* 
Minitab 


a Session 
Regression Analysis: IQ versus Crycount 


The regression equation is 
TQ = 91.3 + 1.49 Crycount 


Predictor Coef SE Coef 
Constant 91.268 8.934 
Crycount 1.4929 0.4870 


17.50 : R-Sq (adj) 


Excel 


£9 Microsoft Excel - Book1 210) xI 


A | 
_| SUMMARY OUTPUT 


Regression statistics 
Multiple R 0.4550 
R Square 0.2070 
Adjusted R Square 0.1850 
Standard Error 17.4987 
Observations 38 


Coefficients StandardError t Stat P-value Lower 95% Upper 95% 
Intercept 91.2683 8.9342 10.2156 3.5E-12 73.1489 109.3877 
Crycount 1.4929 0.4870 3.0655 0.004105 0.5052 2.4806 


ld 4. |bi\sheets /Sheet1 fSheet2 fsheet3 / 
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Crunchlt! FIGURE 26.3 
(Continued) 
Results - Simple Linear Regression + 
Export wv 


Fitted Equation: IQ = 91.27 + 1.493 * Crycount 


(Intercept) ] 91.27 8.934 1022 <0.0001 
Crycount 1.493 0.4870 3.065 0.004105 


r-Squared: 0.2070 


Adjusted r-Squared: 0.1850 


17.50 


sigma: 


Apply Your Knowledge 


26.2 Introspection and Gray Matter. The ability to introspect about self- 
performance is key to human subjective experience. Accurate introspection 
requires discriminating correct decisions from incorrect ones, a capacity that 
varies substantially across individuals. Are individual differences in intro- 
spective ability reflected in the anatomy of brain regions responsible for this 
function? The following data are a measure of introspective ability (labeled 
Aroc and based on the performance of subjects on a task) and a measure of 
gray-matter volume (Brodmann area) in the anterior prefrontal cortex of the 
brain of 29 subjects.’ INTROSP 


Aroc (0.55 0.58 0.59 0.59 0.59 0.61 0.62 0.63 0.63 0.63 
Wolbime | 59 2 4s) os ts Ol Sh S57 Si fe 
Aroc 0.63 (0.64 (0.65 0.65 0.65 0.65 0.65 0.66 0.66 0.67 
Volume| 72 62 58 62 65 $70 75 60 63 71 
Aroc | 0.67 0.67 0.68 0.69 0.70 0.70 0.71 0.72 0.75 
Volume) 71 80 68 72 66 73 61 80 = 75 


We want to predict volume from Aroc. Figure 26.4 shows JMP regression 
output for these data. 


(a) Make a scatterplot suitable for predicting volume from Aroc. What 
is the squared correlation 1’? 


(b) For regression inference, we must estimate the three parameters a, B, 
and a. From the output, what are the estimates of these parameters? 


(c) What is the equation of the least-squares regression line of volume 
on Aroc? Add this line to your plot. We will continue the analysis of 
these data in later exercises. 


26.3 Great Arctic Rivers. One effect of global warming is to increase the flow 
of water into the Arctic Ocean from rivers. Such an increase may have 
major effects on the world’s climate. Six rivers (Yenisey, Lena, Ob, Pechora, 
Kolyma, and Severnaya Dvina) drain two-thirds of the Arctic in Europe and 
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FIGURE 26.4 
JMP output for the introspective 
ability data, for Exercise 26.2. 


V | Linear Fit 
Volume = 10.065511 + 86.030829*Aroc 
Y' Summary of Fit 


RSquare 0.200628 
RSquare Adj 0.171022 
Root Mean Square Error 7.908796 
Mean of Response 65.89655 
Observations (or Sum Wgts) 29 
>| Lack of Fit 
> 


Analysis of Variance 


Y/ Parameter Estimates 


Term Estimate Std Error t Ratio Prob > |t| 
Intercept 10.065511 21.49751 0.47 0.6434 
Aroc 86.030829 33.04842 2.60 0.0148* 


Asia. Several of these are among the largest rivers on earth. Table 26.2 pre- 
sents the total discharge (amount of water flowing from these rivers) each 
year from 1936 to 2010.4 Discharge is measured in cubic kilometers of water. 
Use software to analyze these data. ely) ARCTIC 


(a) Make a scatterplot of river discharge against time. Is there a clear increas- 
ing trend? Calculate r’ and briefly interpret its value. There is considerable 
year-to-year variation, so we wonder if the trend is statistically significant. 


(b) Asa first step, find the least-squares line and draw it on your plot. 
Then find the regression standard error s, which measures scatter 
about this line. We will continue the analysis in later exercises. 


YEAR DISCHARGE | YEAR’ DISCHARGE | YEAR DISCHARGE | YEAR DISCHARGE 
1936 zi 1955 1656 1974 2000 1993 1845 
1937 1713 1956 1721 1975 1928 1994 1902 
1938 1860 1957 1762 1976 1653 1995 1842 
1939 1739 1958 1936 1977 1698 1996 1849 
1940 1615 1959 1906 1978 2008 1997 2007 
1941 1838 1960 1736 1979 1970 1998 1903 
1942 1762 1961 1970 1980 1758 1999 1970 
1943 1709 1962 1849 1981 1774 2000 1905 
1944 1921 1963 1774 1982 1728 2001 1890 
1945 1581 1964 1606 1983 1920 2002 2085 
1946 1834 1965 1735 1984 1823 2003 1780 
1947 1890 1966 1883 1985 1822 2004 1900 
1948 1898 1967 1642 1986 1860 2005 1930 
1949 1958 1968 1713 1987 1732 2006 1910 
1950 1830 1969 1742 1988 1906 2007 2270 
1951 1864 1970 1751 1989 1932 2008 2078 
1952 1829 1971 1879 1990 1861 2009 1900 
1953 1652 1972 1736 1991 1801 2010 1813 
1954 1589 1973 1861 1992 1793 


26.4 Testing the Hypothesis of No Linear Relationship 


Example 26.1 asked, “Do children with higher crying counts tend to have 
higher IQs?” Data analysis supports this conjecture. But is the positive as- 
sociation statistically significant? That is, is it too strong to occur often just 
by chance? To answer this question, test hypotheses about the slope B of the 
population regression line: 


Hy: B = 0 
H,: B >0 


A regression line with slope 0 is horizontal. That is, the mean of y does not change 
at all when x changes. So Hp says that there is no linear relationship between x and 
y in the population. Put another way, Hp says that linear regression of y on x is of no 
value for predicting y. 

The test statistic is just the standardized version of the least-squares slope b, using 
the hypothesized value B = O for the mean of b. It is another t statistic. Here are 
the details. 


Significance Test for Regression Slope 


To test the hypothesis H,: B = O, compute the t statistic 

_ 6b 

~ SE, 

In this formula, the standard error of the least-squares slope b is 


SE, = ee een 

Wie = m)P 
The sum runs over all observations on the explanatory variable x. In terms of a random 
variable T having the t(n — 2) distribution, the P-value for a test of Hp against 


fi 


H:B>0 is P(T=t) 


t 
H,: 8B <0 is P(T St) ( \ 
t 


H,: B# 0 is 2P(T= |t|) 


As advertised, the standard error of b is a multiple of the regression standard 
error s. The degrees of freedom n — 2 are the degrees of freedom of s. Although we 
give the formula for this standard error, you should not try to calculate it by hand. 
Regression software gives the standard error SE, along with b itself. 
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The hypothesis Ho: B = O says that crying has no straight-line relationship with IQ. 
We conjecture that there is a positive relationship, so we use the one-sided alterna- 
tive Hj: B > O. 
Figure 26.1 shows that there is a positive relationship, and from Figure 26.3 we 
see b = 1.4929 and SE, = 0.4870. Thus, 
b 1.4929 


t= = = 3.07 
SE, 0.4870 


so it is not surprising that all the outputs in Figure 26.3 give t = 3.07 with two-sided 
P-value 0.004. The P-value for the one-sided test is half of this, P = 0.002. There is 
very strong evidence that IQ increases as the intensity of crying increases. m 


Apply Your Knowledge 


26.4 Wine and Cancer in Women. Exercise 26.1 gives data on daily wine 
consumption and the relative risk of breast cancer in women. Software 
tells us that the least-squares slope is b = 0.009012 with standard error 
SE, = 0.001112. 


(a) What is the t statistic for testing Hp: B = 0? 

(b) How many degrees of freedom does t have? Use Table C to approxi- 
mate the P-value of t against the one-sided alternative H,: B > 0. 
What do you conclude? 


26.5 Great Arctic Rivers: Testing. The most important question we ask of 
the data in Table 26.2 is this: is the increasing trend visible in your plot 
(Exercise 26.3) statistically significant? If so, changes in the Arctic may 
already be affecting the earth’s climate. Use software to answer this ques- 
tion. Give a test statistic, its P-value, and the conclusion you draw from the 
test. lh ARCTIC 


26.6 Does Fast Driving Waste Fuel? Exercise 4.8 (page 109) gives data on 
the fuel consumption of a small car at various speeds from 10 to 150 kilo- 
meters per hour. Is there significant evidence of straight-line dependence 
between speed and fuel use? Make a scatterplot and use it to explain the 
result of your test. FASTDR 


The least-squares slope b is closely related to the correlation r between the 
explanatory and response variables x and y. In the same way, the slope B of 
the population regression line is closely related to the correlation between x 
and y in the population. In particular, the slope is 0 exactly when the correla- 
tion is 0. 

Testing the null hypothesis Hy: 8 = 0 is therefore exactly the same as testing 
that there is no correlation between x and y in the population from which we drew 
our data. You can use the test for zero slope to test the hypothesis of zero correlation 
between any two quantitative variables. That’s a useful trick. 


26.5 Testing Lack of Correlation 621 


Because correlation also makes sense when there is no explanatory—response 
distinction, it is handy to be able to test correlation without doing regression. Table 
E in the back of the book gives critical values of the sample correlation r under the 
null hypothesis that the correlation is 0 in the population. Use this table when both 
variables have at least approximately Normal distributions or when the sample size 
is large. 


Figure 26.5 displays two scatterplots that we will use to illustrate testing lack of 
correlation and also to illustrate once again the need for formal statistical tests. On 
the left are data from an experiment on the healing of cuts in the limbs of newts. 
The data are the healing rates (micrometers per hour) for the two front limbs of 18 
newts. The right-hand scatterplot shows the first- and second-round scores for the 92 
golfers in the 2013 Masters Tournament. (There are fewer than 92 points because 
of duplicate scores.) 
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FIGURE 26.5 


Two scatterplots for inference about the population correlation, for Example 26.5. (a) Healing rates for the 
two front limbs of 18 newts. (b) Scores on the first two rounds of the 2013 Masters Tournament. 


We will test the hypotheses 
Ho: population correlation = O 
H,: population correlation ~ O 


for both sets of data. (The Masters scores are all whole numbers, but with n = 92 
the robustness of t procedures allows their use.) Software gives 


0.3581 t 
0.2459 t 


newts r 1.5342 P 


2.41 P 


0.1445 
0.0181 


masters r 
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The two-sided P-values for the t statistic for testing slope O are also the two-sided 
P-values for testing correlation O. 

Without software, compare the correlation r = 0.3581 for newts with the critical 
values in the n = 18 row of Table E. It falls between the table entries for one-tail 
probabilities 0.05 and 0.10, so the two-sided P-value lies between 0.10 and 0.20. 
For the Masters data, use the n = 80 row of Table E. (There is no table entry for 
sample size n = 92, so we use the next smaller sample size.) The two-sided P-value 
lies between 0.01 and 0.02. = 


The evidence for nonzero correlation is strong for the Masters scores (t = 2.41, 
P = 0.0181) but not for newts (t = 1.53, P = 0.1445). Yet the correlation for the 
newts is larger than that for the Masters, and the scatterplots suggest similar linear 
relationships for both. What happened? The larger sample size for the Masters data 
is largely responsible. The same r will have a smaller P-value for n = 92 than for 
n = 18. Our eyeball impression, even aided by calculating 7, can’t assess signifi- 
cance. We need the P-value from a formal test to guide us. 


() LaunchPad Online Resources 


* The StatBoards video, Tests of Significance for the Slope, discusses 
the test of significance for the slope and its equivalence to the test 
of correlation in an example. 


Apply Your Knowledge 


26.7 Wine and Cancer in Women: Testing Correlation. Exercise 26.1 gives 
data showing that the risk of breast cancer increases linearly with daily wine 
consumption. There are only four observations, so we worry that the appar- 
ent relationship may be just chance. Is the correlation significantly greater 
than zero? Answer this question in two ways. CANCER 


(a) Return to your t statistic from Exercise 26.4. What is the one-sided 
P-value for this t? Apply your result to test the correlation. 


(b) Find the correlation 7, and use Table E to approximate the P-value 
of the one-sided test. 


26.8 Does Social Rejection Hurt? Exercise 4.46 (page 125) gives data from a 
study of whether social rejection causes activity in areas of the brain that are 
known to be activated by physical pain. The explanatory variable is a sub- 
ject’s score on a test of “social distress” after being excluded from an activity. 
The response variable is activity in an area of the brain that responds to phys- 
ical pain. Your scatterplot (Exercise 4.46) shows a positive linear relationship. 
The research report gives the correlation r and the P-value for a test that r is 
greater than zero. What are r and the P-value? (You can use Table E or you 
can get more accurate P-values for the correlation from regression software.) 
What do you conclude about the relationship? El) REJECT 


The slope B of the population regression line is usually the most important 
parameter in a regression problem. The slope is the rate of change of the mean 
response as the explanatory variable increases. We often want to estimate B. The 


26.6 Confidence Intervals for the Regression Slope 


slope b of the least-squares line is an unbiased estimator of B. A confidence inter- 
val is more useful because it shows how accurate the estimate b is likely to be. The 
confidence interval for B has the familiar form 


estimate a ag =) eer 


Because b is our estimate, the confidence interval is b + t*SE,. Here are the 
details. 


Confidence Interval for Regression Slope 


A level C confidence interval for the slope B of the population regression line is 
() 32 Se, 


Here t* is the critical value for the t(n — 2) density curve with area C between —t* 
and t*. The formula for SE, appears in the box on page 619. 


All the software outputs in Figure 26.3 give the slope b = 1.4929 (or b = 1.493), 
and three also give the standard error SE, = 0.4870. The outputs giving both use 
a similar arrangement, a table in which each regression coefficient is followed by its 
standard error. Excel also gives the lower and upper endpoints of the 95% confi- 
dence interval for the population slope B, 0.505 and 2.481. 

Once we know b and SE,, it is easy to find the confidence interval. There are 38 
data points, so the degrees of freedom are n — 2 = 36. Because Table C does not 
have a row for df = 36, we must use either software or the next smaller degrees of 
freedom in the table, df = 30. To use software, enter 36 degrees of freedom. For 
95% confidence, enter the cumulative proportion 0.975 that corresponds to upper- 
tail area 0.025. Minitab gives 


Student’s t distribution with 36 DF 
P( X <= x ) x 
0.975 2.02809 


The 95% confidence interval for the population slope B is 
b + t*SE, = 1.4929 + (2.02809)(0.4870) 
= 1.4929 + 0.9877 
= 0.505 to 2.481 


This agrees with Excel’s result. We are 95% confident that mean IQ increases by 
between about 0.5 and 2.5 points for each additional peak in crying. m 


You can find a confidence interval for the intercept a of the population regres- 
sion line in the same way, using a and SE, from the “Constant” line of the Minitab 
output or the “Intercept” line in Excel, JMP, and Crunchlt!. We rarely need to 
estimate @. 


() LaunchPad Online Resources 


* The StatBoards video, Confidence Intervals for the Slope, discusses 
confidence intervals for the slope in an example. 
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Apply Your Knowledge 


26.9 Wine and Cancer in Women: Estimating Slope. Exercise 26.1 
gives data on wine consumption and the risk of breast cancer. Software 
tells us that the least-squares slope is b = 0.009012 with standard error 
SE, = 0.001112. Because there are only four observations, the observed 
slope b may not be an accurate estimate of the population slope B. Give a 
90% confidence interval for B. 


26.10 Introspection and Gray Matter: Estimating Slope. Exercise 26.2 gives 
data on introspective ability and gray-matter volume of the brains of sub- 
jects. We want a 95% confidence interval for the slope of the population 
regression line. Starting from the information in the JMP output in Figure 
26.4, find this interval. Say in words what the slope of the population re- 
eression line tells us about the relationship between Aroc and gray-matter 
volume. 


26.11 Great Arctic Rivers: Estimating Slope. Use the data in Table 26.2 
to give a 90% confidence interval for the slope of the population re- 
gression of Arctic river discharge on year. Does this interval convince 
you that discharge is actually increasing over time? Explain your answer. 


ely) ARCTIC 


One of the most common reasons to fit a line to data is to predict the response to 
a particular value of the explanatory variable. This is another setting for regression 
inference: we want not simply a prediction, but a prediction with a margin of error 
that describes how accurate the prediction is likely to be. 


(STATE? The EESEE story “Blood Alcohol Content” describes a study in which 16 
42 student volunteers at the Ohio State University drank a randomly assigned number 


of cans of beer. Thirty minutes later, a police officer measured their blood alcohol 
content (BAC) in grams of alcohol per deciliter of blood. Here are the data:° 


Student 1 2 3 4 5 6 7 8 
Beers 5 2 9 8 3 t 3 5 
BAC 0.10 0.03 O19 O12 0.04 0.095 0.07 0.06 
Student 9 10 11 12 13 14 15 16 
Beers 3 5 4 6 5 7 1 4 
BAC 0.02 0.05 0.07 0.10 0.085 0.09 0.01 0.05 


The students were equally divided between men and women and differed in weight 
and usual drinking habits. Because of this variation, many students don’t believe 
that number of drinks predicts blood alcohol well. Steve thinks he can drive legally 
30 minutes after he finishes drinking five beers. The legal limit for driving is BAC 0.08 
in all states. We want to predict Steve’s blood alcohol content, using no information 
except that he drinks five beers. 


PLAN: Regress BAC on number of beers. Use the regression line to predict 
Steve’s BAC. Give a margin of error that allows us to have 95% confidence in 


our prediction. 


© James Shaffer/PhotoEdit 
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FIGURE 26.6 

Scatterplot of students’ blood alcohol 
content against the number of cans of 
beer consumed, with the least-squares 
regression line, for Example 26.7. 
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Minitab regression output for the blood 
Regression Analysis: BAC versus Beers alcohol content data, for Example 26.7. 


The regression equation is 


BAC = -0.0127 + 0.0180 Beers 

Predictor Coef SE Coef Ay P 
Constant -0.01270 0.01264 -1.00 ORS 82 
Beers 0.017964 0.002402 7.48 0.000 


S = 0202 04410 R-Sq = 80.0% R-Sq(adj) = 78.6% 
Predicted Values for New Observations 


New 
Obs Fit SE Fit 95% CL 953. Pr 
al OLO77L2" OC00S13: (COn06612;" 0.08812) COR0S LODE OF 2 2'3:2)) 


BORA The scatterplot in Figure 26.6 and the regression output in Figure 26.7 
show that student opinion is wrong: number of beers predicts blood alcohol content 
quite well. In fact, r° = 0.80, so that number of beers explains 80% of the observed 
variation in BAC. To predict Steve’s BAC after five beers, use the equation of the 
regression line: 


~0.0127 + 0.0180x 
~0.0127 + 0.0180(5) = 0.077 


< 
II 


That's dangerously close to the legal limit of 0.08. What about 95% confidence? 
The “Predicted Values” part of the output in Figure 26.7 shows two 95% intervals. 
Which should we use? @ 


626 CHAPTER 26 


IS REGRESSION 
GARBAGE? 


No—but garbage can be the setting 
for regression. The U.S. Census 
Bureau once asked if weighing a 
neighborhood’s garbage would help 
count its people. So 63 households 
had their garbage sorted and 
weighed. It turned out that pounds 
of plastic in the trash gave the best 
garbage prediction of the number 
of people in a neighborhood. The 
margin of error for a 95% prediction 
interval in a neighborhood of about 
100 households, based on five 
weeks’ worth of garbage, was about 
+2.5 people. Alas, that is not 
accurate enough to help the U.S. 
Census Bureau. 

prediction interval 


ay MAY THE LONGER 
NAME WIN! 


Regression is far from perfect, but it 
beats most other ways of predicting. 
A writer in the early 1960s noted a 
simple method for predicting U.S. 
presidential elections: just choose 
the candidate with the longer name. 
In the 22 elections from 1876 to 
1960, this method failed only once. 
Let’s hope that the writer didn’t 

bet the family silver on this idea. 
The 13 elections from 1964 to 2012 
presented 11 tests of the “long name 
wins” method (the 1980 candidates 
and the 2000 candidates had names 
of the same length). The longer 
name lost eight of the 11. 


Inference for Regression 


To decide which interval to use, you must answer this question: do you want to 

predict the mean BAC for all students who drink five beers, or do you want to pre- 

dict the BAC of one individual student who drinks five beers? Both of these 

A predictions may be interesting, but they are two different problems. The actual 

prediction is the same, y = 0.077. But the margin of error is different for 

the two kinds of prediction. Individual students who drink five beers don’t all have 

the same BAC. So we need a larger margin of error to pin down Steve’s result than 
to estimate the mean BAC for all students who have five beers. 

Write the given value of the explanatory variable x as x*. In Example 26.7, x 
The distinction between predicting a single outcome and predicting the mean of all 
outcomes when x = x* determines what margin of error is correct. To emphasize 
the distinction, we use different terms for the two intervals. 


* 


» To estimate the mean response, we use a confidence interval. It is an ordinary 
confidence interval for the mean response when x has the value x*, which 
is “, = a + Bx*. This is a parameter, a fixed number whose value we don't 
know. 


To estimate an individual response y, we use a prediction interval. A 
prediction interval estimates a single random response y rather than a 
parameter like y,. The response y is not a fixed number. If we took more 
observations with x = x*, we would get different responses. 


Steve is one individual, so we must use the prediction interval. The output in Figure 
26.7 helpfully labels the confidence interval as “95% Cl” and the prediction interval 
as “95% PI.” We are 95% confident that Steve's BAC after five beers will lie between 
0.032 and 0.122. The upper part of that range will get him arrested if he drives. 
The 95% confidence interval for the mean BAC of all students who drink five beers 
is much narrower, 0.066 to 0.088. m 


The meaning of a prediction interval is very much like the meaning of a confi- 
dence interval. A 95% prediction interval, like a 95% confidence interval, is right 
95% of the time in repeated use. “Repeated use” now means that we take an obser- 
vation on y for each of the n values of x in the original data and then take one more 
observation y with x = x*. Form the prediction interval from the n observations, 
then see if it covers the one more y. It will in 95% of all repetitions. 

The interpretation of prediction intervals is a minor point. The main point is 
that it is harder to predict one response than to predict a mean response. Both 
intervals have the usual form 


5 + t*SE 


but the prediction interval is wider than the confidence interval because individuals 
are more variable than averages. As a further illustration, consider a professional 
athlete such as the basketball player Kobe Bryant. You can find his career statistics 
online. During his career as a starter, his season scoring average ranges from 7.6 to 
35.4 points per game, but individual game performances range from 0 points to a 
career high of 81 points. Individual game performances of professional athletes are 
more variable than season average performances. So the margin of error for predic- 
tions of individual game performances will be larger than for predictions of season 
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averages. You will rarely need to know the details because software automates the 
calculation, but here they are. 


Confidence and Prediction Intervals 


for Regression Response 


A level C confidence interval for the mean response p, when x takes the value 


x** is 
iene are) th 
The standard error SE, is 
1 (x* —xp 
SE, = = 
EN Ge = wy 


A level C prediction interval for a single observation y when x takes the value x* is 
y in t*SEs 
The standard error for prediction SE; is 


1, @=%) 
a + — + —— 
SE; ss 1) SGae: 


In both intervals, t* is the critical value for the t(n — 2) density curve with area C 
between —t* and t*. 


There are two standard errors: SE, for estimating the mean response p,, and 
SE; for predicting an individual response y. The only difference between the two 
standard errors is the extra | under the square root sign in the standard error for 
prediction. The extra 1 makes the prediction interval wider. Both standard errors 
are multiples of the regression standard error s. The degrees of freedom are again 
n — 2, the degrees of freedom of s. 


() LaunchPad Online Resources 


* The StatBoards video, Confidence and Prediction Intervals, uses an 
example to make clear the difference in these two intervals. 


* The Snapshots video, Regression Inference, discusses inference for 
regression. 


Apply Your Knowledge 


26.12 Wine and Cancer in Women: Prediction. Exercise 26.1 gives data on 
wine consumption and the risk of breast cancer. For a new group of women 
who drink an average of 10 grams of wine per day, predict their relative risk 
of breast cancer. 


(a) Figure 26.8 is part of the output from Minitab for prediction when 
x* = 10.0. Which interval in the output is the proper 95% interval 
for predicting the relative risk? 


(b) Minitab gives only one of the two standard errors used in prediction. 
It is SE,, the standard error for estimating the mean response. Use 
this fact along with the output to give a 90% confidence interval for 
the mean relative risk of breast cancer in all women who drink an 
average of 10 grams of wine per day. 
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FIGURE 26.8 


CHAPTER 26 a 


Partial Minitab output for regressing 
relative risk of breast cancer on mean 
daily intake of wine, for Exercise 26.12. 


Inference for Regression 


A Session 


Predictor Coef SE Coef 1p P 
Constant 0299309 On OM ere: 55.88 0.000 
Intake 0.009012 OR Owe? tees li(0) 0.015 


Predicted Values for New Observations 


New 
Obs Fit SE Fit 95%: Cr 95% PI 
I 0832 Us0L057 (ha0s775,, Tet 86S )OR9 864370 Was. 000) 


al] 
| | | Bl 
26.13 Introspection and Gray Matter: Prediction. Analysis of the data in 
Exercise 26.2 shows that the relationship of introspective ability, as mea- 
sured by Aroc, and gray-matter volume is roughly linear. We might want to 
predict the mean gray-matter volume of a person with an Aroc of 0.65. Here 

is the JMP output for prediction when x* = 0.65: 


ha Predicted StdErr Pred Lower 95% Upper 95% StdErr Indiv Lower 95% Upper 95% 
Aroc | Volume Volume Volume Mean Volume | Mean Volume Volume Indiv Volume | Indiv Volume 
0.65 e | 65.985549133 | 1.4690245109 | 62.971359812 | 68.999738454 | 8.044072016 | 49.480476694 | 82.490621572 


residual plot 


(a) Use the regression line from Figure 26.4 to verify that “Predicted 
volume” is the predicted value for x* = 0.65. (Start with the results 
in the “Estimate” column of Figure 26.4 to reduce roundoff error.) 


(b) What is the 95% interval we want? 


You can fit a least-squares line to any set of explanatory-response data when 
both variables are quantitative. If the scatterplot doesn’t show a roughly linear 
pattern, the fitted line may be almost useless. But it is still the line that fits the 

data best in the least-squares sense. To use regression inference, however, 
A the data must satisfy additional conditions. Before you can trust the results 

of inference, you must check the conditions for inference one by one. There 
are ways to deal with violations of any of the conditions. If you see a clear viola- 
tion, get expert advice. 

Although the conditions for regression inference are a bit elaborate, it is not hard 
to check for major violations. The conditions involve the population regression line 
and the deviations of responses from this line. We can’t observe the population line, 
but the least-squares line estimates it and the residuals estimate the deviations from 
the population line. You can check all the conditions for regression inference by looking 
at graphs of the residuals. This is what we recommend in practice (and a failure to do 
so can sometimes lead to unjustified conclusions), and most regression software will 
calculate and save the residuals for you. Start by making a stemplot or histogram of 


the residuals and also a residual plot, a plot of the residuals against the explanatory 
variable x, with a horizontal line at the “residual = 0” position. The “residual = 0” 
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line represents the position of the least-squares line in the scatterplot of y against x. 
Let’s look at each condition in turn. 


The relationship is linear in the population. Look for curved patterns or 
other departures from a straight-line overall pattern in the residual plot. You 
can also use the original scatterplot, but the residual plot magnifies any effects. 


The response varies Normally about the population regression line. Because 
different y-values usually come from different x-values, the responses themselves 
need not be Normal. It is the deviations from the population line—estimated by 
the residuals—that must be Normal. Check for clear skewness or other major 
departures from Normality in your stemplot or histogram of the residuals. 


Observations are independent. In particular, repeated observations on the 
same individual are not allowed. You should not use ordinary regression to 
make inferences about the growth of a single child over time, for example. 
Signs of dependence in the residual plot are a bit subtle, so we usually rely on 
common sense. 


The standard deviation of the responses is the same for all values of x. Look 
at the scatter of the residuals above and below the “residual = 0” line in the 
residual plot. The scatter should be roughly the same from one end to the other. 
You will sometimes find that, as the response y gets larger, so does the scatter of 
the residuals. Rather than remaining fixed, the standard deviation a about the 
line changes with x as the mean response changes with x. There is no fixed o for 
s to estimate. You cannot trust the results of inference when this happens. 


You will always see some irregularity when you look for Normality and fixed 
standard deviation in the residuals, especially when you have few observations. 
Don’t overreact to minor violations of the conditions. Like other t procedures, infer- 
ence for regression is (with one exception) not very sensitive to lack of Normality, 
especially when we have many observations. Do beware of influential observations, 
which can greatly affect the results of inference. 

The exception is the prediction interval for a single response y. This inter- 
val relies on Normality of individual observations, not just on the approximate 
Normality of statistics like the slope a and intercept b of the least-squares line. 
The statistics a and b become more Normal as we take more observations. This 
contributes to the robustness of regression inference, but it isn’t enough for the 

prediction interval. We will not study methods that carefully check 
A Normality of the residuals, so you should regard prediction intervals as 
rough approximations. 


(STATE?) As the climate grows warmer, we expect many animal species to move to- a, 
ward the poles in an attempt to maintain their preferred temperature range. Do data atin] 42 
on fish in the North Sea confirm this expectation? Table 26.3 gives data for 25 years 
on mean winter temperatures at the bottom of the North Sea (degrees Celsius) and 
the center of the distribution of anglerfish in degrees of north latitude.® 


“PLAN: Regress latitude on temperature. Look for a positive linear relationship and 
assess its significance. Be sure to check the conditions for regression inference. 


| SOLVE: fine scatterplot in Figure 26.9 shows a clear positive linear relationship. 
The solid line in the plot is the least-squares regression line of the center of the fish 
distribution (north latitude) on winter ocean temperature. Software shows that the 
slope is b = 0.818. That is, each degree of ocean warming moves the fish about 
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YEAR TEMP. LATITUDE | YEAR TEMP. LATITUDE | YEAR TEMP. LATITUDE 
1977 6.26 57.20 1986 6.52 57.72 1994 7.02 58.71 
1978 6.26 57.96 1987 6.68 57.83 125 0s 58.07 
1979 6.27 57.65 1988 6.76 57.87 1996 7.13 58.49 
1980 6.31 Dios 1989 6.78 57.48 1997 7.15 58.28 
1981 6.34 58.01 1990 6.89 58.13 1998 7.29 58.49 
1982 6.32 59.06 1991 6.90 58.52 1999 7.34 58.01 
1983 6.37 56.85 1992 6.93 58.48 2000 7.57 58.57 
1984 6.39 56.87 1993 6.98 57.89 2001 7.65 58.90 
1985 6.42 57.43 


© Jurgen Freund/naturepl.com 


0.8 degree of latitude farther north. The t statistic for testing Ho: B =O is 
t = 3.6287 with one-sided P-value P = 0.0007. There is very strong evidence that 
the population slope is positive, B > O. 
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North latitude 


FIGURE 26.9 

Scatterplot of the latitude of the center 
of the distribution of anglerfish in 

the North Sea against mean winter 
temperature at the bottom of the sea, 
for Example 26.9. The two regression 6.0 6.5 7.0 7.5 8.0 
lines are for the data with (solid) and Temperature (degrees Celsius) 

without (dashed) Observation 6. 


| CONCLUDE: Miitemertel give highly significant evidence that anglerfish have moved 
north as the ocean has grown warmer. Before relying on this conclusion, we must 
check the conditions for inference. m 


The software that did the regression calculations also finds the 25 residuals. In 
the same order as the observations in Example 26.9, they are 


=0).3731 0.3869 0.0687 -0.0240 0.3714 1.4378 -0.8131 

-0.8095 -0.2740 -0.0658 -0.0867 -0.1121 -0.5185 0.0415 
0.4234 0.3588 -0.2721 0.5152 -0.1821 0.2052 -0.0211 
0.0743 -0.4466 -0.0747 0.1899 
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Number of observations 


=100) “O75 -—O50 —025 Of) O05 ©50 Of 100) AS) 1.50 
Regression residual 


Begin by making two graphs of the residuals. Figure 26.10 is a histogram of the 
residuals. Figure 26.11 is the residual plot, a plot of the residuals against the explana- 
tory variable, sea-bottom temperature. The “residual = 0” line marks the position of 
the regression line. Patterns in residual plots are often easier to see if you use a wider 
vertical scale than your software’s default plot, and we suggest you do so if possible. 
Both graphs show that Observation 6 is a high outlier. Let’s check the conditions 
for regression inference. 
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10 
a 
3 
™m O5 od 
uv e 
= ee e 
= 
2 e e 
g e e 
v e 
af 1 Ss : : 
a e e 
e 
e 
es 
oe 
6.5 7.0 75 8.0 


Temperature (degrees Celsius) 


FIGURE 26.10 

Histogram of the residuals from the 
regression of latitude on temperature 
in Example 26.9. 


FIGURE 26.11 
Residual plot for the regression of latitude 
on temperature in Example 26.9. 
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« Linear relationship. The scatterplot in Figure 26.9 and the residual plot in 
Figure 26.11 both show a linear relationship except for the outlier. 


= Normal residuals. The histogram in Figure 26.10 is roughly symmetric and 
single peaked. There are no important departures from Normality except for 
the outlier. 


= Independent observations. The observations were taken a year apart, so we 
are willing to regard them as close to independent. The residual plot shows 
no obvious pattern of dependence, such as runs of points all above or all 
below the line. 


= Constant standard deviation. Again excepting the outlier, the residual plot 
shows no unusual variation in the scatter of the residuals above and below 
the line as x varies. 


The outlier is the only serious violation of the conditions for inference. How 
influential is the outlier? The dashed line in Figure 26.9 is the regression line 
without Observation 6. Because there are several other observations with similar 
values of temperature, dropping Observation 6 does not move the regression line 
very much. Even though the outlier is not very influential for the regression line, it 
influences regression inference because of its effect on the regression standard error. 
The standard error is s = 0.4734 with Observation 6 and s = 0.3622 without 

it. When we omit the outlier, the t statistic changes from t = 3.6287 

A to t = 5.5599, and the one-sided P-value changes from P = 0.0007 to 

P < 0.00001. Fortunately, the outlier does not affect the conclusion 

we drew from the data. Dropping Observation 6 makes the test for the popula- 

tion slope more significant and increases the percent of variation in fish location 
explained by ocean temperature. 

One more caution about inference in this example: as usual in an observational 
study, the possibility of lurking variables makes us hesitant to conclude that rising 
temperature is causing anglerfish to move north. Ocean temperature was steadily 
rising during these years. The effect on fish latitude of any lurking variable that 
increased over time—perhaps increased commercial fishing—is confounded with 
the effect of temperature. 


() LaunchPad Online Resources 


* The StatBoards video, Checking Conditions for Inference, uses an 
example to illustrate checking conditions for inference. 


Apply Your Knowledge 


26.14 Crying and IQ: Residuals. The residuals for the study of crying and IQ 
appear in Example 26.3. CRYINGR 


(a) Make a stemplot to display the distribution of the residuals (round 
to the nearest whole number). Are there strong outliers or other signs 
of departures from Normality? 


(b) Make a residual plot of residuals against crying peaks. Try a vertical 
scale of — 60 to 60 to show patterns more clearly. Draw the “residual 
= 0” line. Does the residual plot show clear deviations from a linear 
pattern or clearly unequal variability about the line? 


26.15 
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(c) Using the information given in Example 26.1, explain why the 
38 observations are independent. 
Introspection and Gray Matter: Residuals. Figure 26.4 gives part of 
the JMP output for the data on introspective ability and gray-matter volume 
in Exercise 26.2. Figure 26.12 comes from another part of the output. It 
gives x, y, the predicted response 4, the residual y — y, and related quanti- 
ties for each of the 29 observations. Most statistical software provides simi- 
lar output. Examine the conditions for regression inference one by one. This 
example illustrates mild violations of the conditions that did not prevent 
the researchers from doing inference. £j),) INTRSPR 
(a) Linear relationship. Your scatterplot and 7’ from Exercise 26.2 show 
that the relationship is roughly linear. Plot the residuals against Aroc. 
Are any deviations from a straight line apparent? 
(b) Normal variation about the line. Make a stemplot of the residuals 
(round to the third decimal place and don’t forget that —0.00 and 
0.00 are separate stems). With only 29 observations, a small amount 
of skew is not disturbing. Does your plot indicate that any of the 
observations are outliers? If so, which? 
(c) Independent observations. The data come from 29 different subjects 
who were each measured separately. Can you conclude that the 
observations are independent? 
(d) Variability about the line stays the same. Is there any evidence that 
the variability may be larger at one end? 
[vy] Predicted StdErr Pred Residual FIGURE 26.12 
Aroc | Volume Volume Volume Volume Residuals from JMP for Exercise 
i || ©.55 59 57.382466281 | 3.5852535563 | 1.6175337187 26.15. The table gives the predicted 
2 | 0.58 62 | 59.963391137 | 2.7113878012 | 2.0366088632 veallunavs ental Gheiestdlweall dor eae 
30 |ons9 43 | 60.823699422 | 2.4401564173 | -17.82369942 Bbeervation. 
4 | 0.59 63 | 60.823699422 | 2.4401564173 | 2.176300578 
5m |Wowss 83 | 60.823699422 | 2.4401564173 | 22.176300578 
6 | oc. 61 | 62.544315992 | 1.9532437971 | -1.544315992 
7 || ae 55 | 63.404624277 | 1.7530601213 | -8.404624277 
g | 0.63 57 | 64.264932563 | 1.5967836073 | -7.264932563 
9 | 0.63 57 | 64.264932563 | 1.5967836073 | -7.264932563 
10 | 0.63 67 | 64.264932563 | 1.5967836073 | 2.7350674374 
i || Oc 72 | 64.264932563 | 1.5967836073 | 7.7350674374 
12 | 0.64 62 | 65.125240848 | 1.4982174843 | -3.125240848 
130 |Mowes 58 | 65.985549133 | 1.4690245109 | -7.985549133 
14 | 0.65 62 | 65.985549133 | 1.4690245109 | -3.985549133 
i | 0. 65 | 65.985549133 | 1.4690245109 | -0.985549133 
ie ones 70 | 65.985549133 | 1.4690245109 | 4.0144508671 
@ | O65 75 | 65.985549133 | 1.4690245109 | 9.0144508671 
1g | 0.66 60 | 66.845857418 | 1.5132250456 | -6.845857418 
19 | 0.66 63. | 66.845857418 | 1.5132250456 | -3.845857418 
AO || Oae7 71 | 67.706165703 | 1.6248405171 | 3.2938342967 
a || O.GH 71 | 67.706165703 | 1.6248405171 | 3.2938342967 
|| 0.67 80. | 67.706165703 | 1.6248405171 | 12.293834297 
a || 0.68 68 | 68.566473988 | 1.7913132096 | -0.566473988 
24 | 0.69 72 | 69.426782274 | 1.9989844836 | 2.5732177264 
a5 || Way 66 | 70.287090559 | 2.2364066174 | -4.287090559 
36 | OF 73. | 70.287090559 | 2.2364066174 | 2.7129094412 
a || Oo 61 | 71.147398844 | 2.4951011589 | -10.14739884 
28 | 0.72 80. | 72.007707129 | 2.7691126141 | 7.9922928709 
29 | 0.75 75. | 74.588631985 | 3.6477372643 | 0.4113680154 
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Inference for Regression 


CHAPTER 26 SUMMARY 


Chapter Specifics 


Least-squares regression fits a straight line to data to predict a response variable 
from an explanatory variable x. Inference about regression requires more conditions. 


The conditions for regression inference say that there is a population regression line 
ty = a + Bx that describes how the mean response varies as x changes. The observed 
response y for any x has a Normal distribution with mean given by the population 
regression line and with the same standard deviation o for any value of x. Observations 
on y are independent. 


The parameters to be estimated are the intercept @ and the slope B of the population 
regression line and also the standard deviation a. The slope a and intercept b of the least- 
squares line estimate a and B. Use the regression standard error s to estimate o. 


The regression standard error s has n — 2 degrees of freedom. All t procedures for 
regression inference have n — 2 degrees of freedom. 


To test the hypothesis that the slope is zero in the population, use the t statistic 
t = b/SE,. This null hypothesis says that straight-line dependence on x has no value 
for predicting y. In practice, use software to find the slope b of the least-squares line, 
its standard error SE,, and the t statistic. 


The t test for regression slope is also a test for the hypothesis that the population 
correlation between x and y is zero. To do this test without software, use the sample 
correlation r and Table E. 


Confidence intervals for the slope of the population regression line have the form 
be t*SE,. 

Confidence intervals for the mean response when x has value x* have the form 
y + t*SE,. Prediction intervals for an individual future response y have a similar 
form with a larger standard error, y + t*SE;. Software often gives these intervals. 


STATISTICS IN SUMMARY 


Here are the most important skills you should have acquired from reading this chapter. 


A. Preliminaries 


1. Make a scatterplot to show the relationship between an explanatory and a response 
variable. 


2. Use software or a calculator to find the correlation and the equation of the least- 
squares regression line. 


3. Recognize which type of inference you need in a particular regression setting. 


Inference Using Software Output 


1. Explain in any specific regression setting the meaning of the slope B of the popula- 
tion regression line. 

2. Understand software output for regression. Find in the output the slope and intercept 
of the least-squares line, their standard errors, and the regression standard error. 


3. Use that information to carry out tests of Ho: 8B = 0 and calculate confidence 
intervals for B. 


4. Explain the distinction between a confidence interval for the mean response and a 
prediction interval for an individual response. 


5. If software gives output for prediction, use that output to give either confidence or 
prediction intervals. 


C. Checking the Conditions for Regression Inference 


1. Make a stemplot or histogram of the residuals and look for strong departures from 
Normality. 


2. Make a residual plot and look for departures from a linear pattern or unequal vari- 
ability about the “residual = 0” line. 


3. Ask whether the study design suggests that observations are independent. 


Link It 


In Chapters 4 and 5, we studied scatterplots, correlation, and the least-squares regression line 
as methods for exploring data. In Chapter 5, we mentioned that we must exercise caution in 
how we interpret any relationships we observe through such exploratory analyses. We also 
mentioned that such interpretations rest on the assumption that the relationship is valid in 
some broader sense. And we promised to explore this more carefully later in this book. In this 
chapter, we did so by considering inference for regression. Inference for regression allows us to 
determine whether the relationship we observe in a scatterplot is valid for some larger popula- 
tion. It also allows us to attach margins of error to our estimates of the slope and intercept, as 
well as to make predictions based on the least-squares regression line. 

In Chapters 4 and 5, we discussed several cautions about correlation and the least- 
squares regression line. These same cautions apply to inference for regression. We discussed 
a systematic approach using the residuals and what we know of the study design to deter- 
mine if the assumptions behind inference for regression are reasonable. 

This chapter considers inference for relationships between two quantitative variables. 
Chapter 25 considered inference for relationships between two categorical variables. In 
the next chapter, we consider inference for relationships between a quantitative and a 
categorical variable—in particular, for deciding whether the mean of a response is the same 
for more than two categories. 


() LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


* StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. You may find this helpful if you need 
additional help understanding any of the material in this chapter. 


* LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 


Statistics in Summary 
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CHECK YOUR SKILLS 


Florida reappraises real estate every year, so the county appraiser’s 
website lists the current “fair market value” of each piece of prop- 
erty. Property usually sells for somewhat more than the appraised 


market value. Here are the appraised market values and actual i 2 
selling prices (in thousands of dollars) of condominium units sold 8 
in a beachfront building in a 137-month period between 2003 as 
and 2014:' 3 
5 
SELLING APPRAISED SELLING APPRAISED SELLING APPRAISED 
PRICE VALUE MONTH PRICE VALUE MONTH PRICE VALUE MONTH 
825 626 0 1164 953 59) 1083 601 101 
590 492 1 1425 922 64 1200 904 109 
1075 930 3 1865 1190 64 650 505 110 
890 790 9 1450 610 64 1000 877 113 
845 648 13 875 806 64 1100 895 118 
1100 942 14 1510 1241 70 549 481 119 
WS 345 15 1375) 813 70 1405 822 120 
1325 1032 19 560 496 73 1000 870 AL 
700 556 ZA 1050 HE: 79 1200 i 124 
1322 879 26 605 470 86 1100 678 125 
1900 1016 26 675 545 88 95) oli 129 
1600 1040 28 693 690 93 1200 888 129 
980 442 34 360 690 94 900 850 129 
940 771 37 1455 1132 96 825 516 134 
850 715 47 1068 867 99 886 626 sy 
1100 997 54 135 1009 100 
Here is part of the Minitab output for regressing selling price on 26.17 Whatis the correlation between selling price and ap- 


appraised value, along with prediction for a unit with appraised 


value $800,000: 

Predictor Coef SE Coef ae P 
Constant ii, 2 123.2 0.91 0.368 
appraised 1.2103 0.1540 7.86 0.000 


S = 221.341 R-Sq = 57.9% R-Sq(adj) = 56 

Predicted Values for New Observations 

New 

Obs Fit SE Fit se (eur 95% 
il al{e)tsKo) 3 ak s2.6 (1014.5, 1145.7) (629.5, 


Exercises 26.16 to 26.24 are based on this information. 


PI 
1530) 


26.16 The equation of the least-squares regression line for 
predicting selling price from appraised value is 


(a) price = 1.2103 + 111.9 X appraised value. 
(b) price = 111.9 + 1.2103 X appraised value. 
(c) price = 123.2 + 0.1540 X appraised value. 


26.18 


26.19 


praised value? 

(a) 0.761 (b) 0.579 (c) 0.569 

The slope 8 of the population regression line describes 
(a) the average selling price in a population of units 
when a unit’s appraised value is 0. 

(b) the average increase in selling price in a 
population of units when appraised value increases 
by $1000. 

(c) the exact increase in the selling price of an 
individual unit when its appraised value increases 
by $1000. 

Is there significant evidence that selling price in- 
creases as appraised value increases? To answer this 
question, test the hypotheses 

(a) Hy: 8B = O versus H,: B > 0. 

(b) Ho: B = 0 versus H,: B ¥ 0. 

(c) Hp: a = O versus H,:a > 0. 


26.20 Minitab shows that the P-value for this test is 
(a) 0.368. 
(b) 0.1540 
(c) less than 0.001. 


26.21 The regression standard error for these data is 
(a) 0.1540. 
(b) 123.2. 
(c) 221.341. 
26.22 Confidence intervals and tests for these data use the 


t distribution with degrees of freedom 


(a) 47. (b) 46. (c) 45. 
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26.23 A 95% confidence interval for the population slope 
Bis 

(a) 1.2103 
(b) 1.2103 


= 


447.3302. 
az (0), 
(c) 1.2103 + 0.1540. 


Louisa owns a unit in this building appraised at 
$800,000. The Minitab output includes prediction 
for this appraised value. She can be 95% confident 
that her unit would sell for between 


(a) $629,500 and $1,530,000. 
(b) $1,014,500 and $1,145,700. 
(c) $1,080,100 and $1,112,700. 


26.24 
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26.25 Genetically en- 
gineered cotton. 
A strain of geneti- 
cally engineered 
cotton, known as 
Bt cotton, is re- 
sistant to certain 
insects, which re- 
sults in larger yields of cotton. Farmers in northern 
China have increased the number of acres planted in 
Bt cotton. Because Bt cotton in resistant to certain 
pests, farmers have also reduced their use of insecti- 
cide. Scientists in China were interested in the long- 
term effects of Bt cotton cultivation and decreased 
insecticide use on insect populations that are not 
affected by Bt cotton. One such insect is the mirid 
bug. Scientists measured the number of mirid bugs 
per 100 plants and the proportion of Bt cotton plant- 
ed at 38 locations in northern China for the 12-year 
period from 1997 to 2008. The scientists reported a 
regression analysis as follows:§ 


number of mirid bugs per 100 plants 


0.54 + 6.81 X Btcotton planting proportion 
r= 0.90, P < 0.0001 


(a) What does the slope b = 6.81 say about the 
relation between Bt cotton planting proportion and 
number of mirid bugs per 100 plants? 

(b) What does 7? = 0.90 add to the information 
given by the equation of the least-squares line? 

(c) What null and alternative hypotheses do you 
think the P-value refers to? What does this 
P-value tell you? 

(d) Does the large value of r? and the small 
P-value indicate that increasing the proportion of 
acres planted in Bt cotton causes an increase in 
mirid bugs? 


© Softdreams/Dreamstime.com 


Exercise 7.51 (page 197) gives data from a study of the “gate veloc- 
ity” of molten metal that experienced foundry workers choose based 
on the thickness of the aluminum piston being cast. Gate velocity 
is measured in feet per second, and the piston wall thickness is in 
inches. A scatterplot (you need not make one) shows no outliers 
and a moderately strong positive linear relationship. Figure 26.13 
displays part of the Minitab regression output. Exercises 26.26 to 
26.28 analyze these data. 


26.26 Casting aluminum: Is there a_ relationship? 
Figure 26.13 leaves out the t statistics and their P-val- 
ues. Based on the information in the output, test the 
hypothesis that there is no straight-line relationship 
between thickness and gate velocity. State hypotheses, 
give a test statistic and its approximate P-value, and 


state your conclusion in the context of the problem. 


26.27 Casting aluminum: intervals. The output in 
Figure 26.13 includes prediction for piston wall thick- 
ness x* = 0.5 inch. Use the output to give 90% inter- 


vals for 


(a) the slope of the population regression line of 
gate velocity on piston thickness. 

(b) the average gate velocity for a type of piston 
with thickness 0.5 inch. 


26.28 Casting aluminum: residuals. The output in 
Figure 26.13 includes a table of the x and y variables, 
the fitted values ) for each x, the residuals, and 


some related quantities. £j),) ALUMRES 


(a) Plot the residuals against thickness (the explan- 
atory variable). Use vertical scale —200 to 200 so 
that the pattern is clearer. Add the “residual = 0” 
line. Does your plot show a systematically nonlinear 
relationship? Does it show systematic change in the 
variability about the regression line? 

(b) Make a histogram of the residuals. Minitab 
identifies the residual for Observation 9 as a sus- 
pected outlier. Does your histogram agree? 
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FIGURE 26.13 

Minitab output for the regression of gate 
velocity on piston thickness in casting 
aluminum parts, for Exercises 26.26 to 
26.28. 


m Session 


Regression Analysis: Veloc versus Thick 


The regression equation is 


Veloe = 10.4 + 275 Thrck 


Predictor Coef 
Constant 70.44 
Thaek QI 
Sra 5'6.- 3 64a R-Sq 
Obs Thick Veloc 
al 0.248 L238 

2} ORS 5 9) 22359 

3 03:66 LEO 

4 0.400 104.8 

5 Ou 24 228:..6 

6 O2S52 B23 R13 

yf: 0.628 326.2 

8 OR:69%7 3024 

9 On6:97, a5 22 
10 ORES 2 ZOOS ei, 
dial 0.806 302.4 
12 Om8 ea: 302/54 


R denotes an observation with a large 


SE 


Bae 
3,8) 
169% 
die 
180. 
214. 
Cnene 
243%, 
262. 
ZO2s 
Dihelis 
Pace 
290% 


OO FO: ©. OEE GS: iF o7 


SE 


R-Sq(adj) = 44.2% 


Fit Residual St Resid 


S278 -14.8 Onesie 
24.8 54.8 1.08 
24.3 sae) Os 19 
Vente SD: -1.46 
16.8 14.2 O26 
16.4 el 0.03 
iO) 83752 195) 
TOS a: 40.4 On. 
EO Fal = 16.78 =2;21R 
22.8 -14.0 =0.27 
26.4 LOD Oza 
27.4 6.4 Os 


standardized residual. 


Predicted Values for New Observations 


New 
Obs Fit SEB 
Al PH Oe ieee) 17.4 


(c) Redoing the regression without Observation 9 
gives regression standard error s = 42.4725 and 
predicted mean velocity 216 feet per second (90% 
confidence interval 191.4 to 240.6) for piston walls 
0.5 inch thick. Compare these values with those in 
Figure 26.13. Is Observation 9 influential for inference? 


Table 4.1 (page 107) gives 37 years’ data on boats registered in 
Florida and manatees killed by boats. Figure 4.2 (page 108) shows 
a strong linear relationship. The correlation is r = 0.954. Figure 
26.14 shows part of the Minitab regression output. Exercises 26.29 
to 26.31 analyze the manatee data. 


26.29 Manatees: Conditions for inference. We know that 
there is a strong linear relationship. Let’s check the 
other conditions for inference. Figure 26.14 includes 
a table of the two variables, the predicted values 
y for each x in the data, the residuals, and related 
quantities. 2yj,) MANATRS 


(a) Round the residuals to the nearest whole num- 
ber and make a stemplot. The distribution is single 
peaked and symmetric and appears close to Normal. 
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(T6225, 


90% PI 
(10059. 343) 


90% Cr 


239.4) 


(b) Make a residual plot of residuals against boats 
registered. Use a vertical scale from —20 to 20 to 
show the pattern more clearly. Add the “residual = 0” 
line. There is no clearly nonlinear pattern. The 
variability about the line may be a bit greater for 
larger values of the explanatory variable, but the 
effect is not large. 

(c) It is reasonable to regard the number of mana- 
tees killed by boats in successive years as indepen- 
dent. The number of boats grew over time. Some- 
one says that pollution also grew over time and 
may explain more manatee deaths. How would you 
respond to this idea? 


Manatees: Do more boats bring more kills? The 
output in Figure 26.14 omits the t statistics and 
their P-values. Based on the information in the 
output, is there good evidence that the number of 
manatees killed increases as the number of boats 
registered increases? State hypotheses and give a 
test statistic and its approximate P-value. What do 
you conclude? 


Session 


Regression Analysis: Kills versus Boats 


The regression equation is 


Kills = -44.8 + 0.132 Boats 
Predictor Coef SE Coef aL P 
Constant -44,831 5502 
Boats OMS 2259 0.007067 
See 5 Osa: R-Sq = 90.9% R-Sq(adj) = 90.7% 
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FIGURE 26.14 

Minitab output for the regression of 
number of manatees killed by boats on 
the number of boats (in thousands) 
registered in Florida, for Exercises 26.29 
10.26.31. 
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26.32 


26.33 
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Manatees: estimation. The output in Figure 26.14 
includes prediction of the number of manatees killed 
when there are 900,000 boats registered in Florida. 
Give 95% intervals for 


(a) the increase in the number of manatees killed 
for each additional 1000 boats registered. 

(b) the number of manatees that will be killed next 
year if there are 900,000 boats registered next year. 
Fidgeting keeps you slim: inference. Our first 
example of regression (Example 5.1, page 128) pre- 
sented data showing that people who increased their 
nonexercise activity (NEA) when they were deliber- 
ately overfed gained less fat than other people. Use 
software to add formal inference to the data analysis 
for these data. yy)) FATGAIN 


(a) Based on 16 subjects, the correlation between 
NEA increase and fat gain was r = —0.7786. Is 

this significant evidence that people with higher 
NEA increase gain less fat? (Report a t statistic from 
regression output and give the one-sided P-value.) 
(b) The slope of the least-squares regression line 
was b = —0.00344, so that fat gain decreased by 
0.00344 kilogram for each added calorie of NEA. 
Give a 90% confidence interval for the slope of the 
population regression line. This rate of change is 
the most important parameter to be estimated. 

(c) Sam’s NEA increases by 400 calories. His 
predicted fat gain is 2.13 kilograms. Give a 95% 
interval for predicting Sam’s fat gain. 

Predicting tropical storms. Exercise 5.61 (page 160) 
gives data on William Gray’s predictions of the num- 
ber of named tropical storms in Atlantic hurricane 
seasons from 1984 to 2013. Use these data for regres- 
sion inference as follows. £j),) storms2 


(a) Does Professor Gray do better than random 
guessing? That is, is there a significantly positive 
correlation between his forecasts and the actual 
number of storms? (Report a t statistic from regres- 
sion output, and give the one-sided P-value.) 

(b) Give a 95% confidence interval for the mean 
number of storms in years when Professor Gray fore- 
casts 16 storms. 

Coral growth. Sea surface temperatures across 
much of the tropics have been increasing since the 
mid-1970s. At the same time, the growth of coral has 
been decreasing. Scientists examined data on mean 
sea surface temperatures (SST) in degrees Celsius 
and mean coral growth in millimeters (mm) per year 
over a several-year period at locations in the Red 
Sea. Here are the data:’ £yy) 


SST 


29.68 29.87 30.16 30.22 30.48 30.65 30.90 


Growth 


JESSE) Peois) PAS) PASO) His) Shs) YK) 
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(a) Do the data indicate that coral growth decreases 
linearly as SST increases? Is this change statistically 
significant? 

(b) Use the data to predict with 95% confidence 
the mean coral growth (millimeters per year) when 
SST is 30.0 degrees Celsius. 


Predicting tropical storms: residuals. Make a 
stemplot of the residuals (round to the nearest tenth) 
from your regression in Exercise 26.33. Explain why 
your plot suggests that we should not use these data 
to get a prediction interval for the number of storms 
in a single year. jj) storms2 


Coral growth: residuals. Do the data in Exercise 
26.34 on mean sea surface temperatures and coral 
growth in the Red Sea satisfy the conditions for 
regression inference? To examine this, here are the 
residuals: ) CORALR 


SST 


29.68 29.87 30.16 30.22 30.48 30.65 30.90 


Residual 


—0.067 —0.060 0.128 0.066 0.025 —0.024 —0.068 
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(a) Linear relationship. A plot of the residuals against 
the explanatory variable x magnifies the deviations 
from the least-squares line. Does the plot show any 
systematic deviation from a roughly linear pattern? 

(b) Normal variation about the line. Make a histo- 
gram of the residuals. With only seven observations, 
no clear shape emerges. Do strong skewness or 
outliers suggest lack of Normality? 

(c) Independent observations. Why are the seven 
observations independent? 

(d) Variability about the line stays the same. Does 
your plot in part (a) show any systematic change in 
variability as x changes? 


Our brains don’t like losses. Exercise 4.29 
(page 121) describes an experiment that showed a 
linear relationship between how sensitive people 
are to monetary losses (“behavioral loss aver- 
sion”) and activity in one part of their brains 
(“neural loss aversion”). {jj)) Losses 


(a) Make a scatterplot with neural loss aversion 

as x and behavioral loss aversion as y. One point 

is a high outlier in both the x and y directions. In 
Exercise 5.41 (page 156) you found that this outlier 
is not influential for the least-squares line. 

(b) The research report says that r = 0.85 and that 
the test for regression slope has P < 0.001. Verify 
these results, using all the observations. 

(c) The report recognizes the outlier and says, 
“However, this regression also remained highly sig- 
nificant (P = 0.004) when the extreme data point 
(top right corner) was removed from the analysis.” 
Repeat your analysis omitting the outlier. Show 


26.38 


that the outlier influences regression inference by 
comparing the t statistic for testing slope with and 
without the outlier. Then verify the report’s claim 
about the P-value of this test. 


Time at the table. Does how long young children 
remain at the lunch table help predict how much 
they eat? Here are data on 20 toddlers observed over 
several months at a nursery school.!° “Time” is the 
average number of minutes a child spent at the table 
when lunch was served. “Calories” is the average 
number of calories the child consumed during lunch, 
calculated from careful observation of what the child 
ate each day. TABTIME 


Time 


BVA 3085 Sie 335) 32.8) 39512283471 33.9) 43°83 


Calories 


Gee Geeks) aie) GeI0) ys) Guy) Sis) Gill Are) ar 


Time 


42.4 43.1 29.2 31.3 28.6 32.9 30.6 35.1 33.0 43.7 


Calories 
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450 410 504 437 489 436 480 439 444 408 


(a) Make a scatterplot. Find the correlation and the 
least-squares regression line. (Be sure to save the 
regression residuals.) Based on your work, describe 
the direction, form, and strength of the relationship. 
(b) Check the conditions for regression inference. 
Parts (a) to (d) of Exercise 26.36 provide a handy 
outline. Use vertical limits —100 to 100 in your plot 
of the residuals against time to help you see the pat- 
tern. What do you conclude? 

(c) Is there significant evidence that more time at 
the table is associated with more calories con- 
sumed? Give a 95% confidence interval to estimate 
how rapidly calories consumed changes as time at 
the table increases. 


DNA on the ocean floor. We think of DNA as the 
stuff that stores the genetic code. It turns out that 
DNA occurs, mainly outside living cells, on the ocean 
floor. It is important in nourishing seafloor life. Scien- 
tists think that this DNA comes from organic matter 
that settles to the bottom from the top layers of the 
ocean. “Phytopigments,” which come mainly from 
algae, are a measure of the amount of organic mat- 
ter that has settled to the bottom. The data contains 
concentrations of DNA and phytopigments (both in 
grams per square meter) in 116 ocean locations around 
the world.!! Look first at DNA alone. Describe the 
distribution of DNA concentration and give a confi- 
dence interval for the mean concentration. Be sure to 
explain why your confidence interval is trustworthy 
in the light of the shape of the distribution. The data 
show surprisingly high DNA concentration, and this 
by itself was an important finding. DNA 


Time at the table: prediction. Rachel attends 
the nursery school of Exercise 26.38. Over several 
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months, Rachel averages 40 minutes at the lunch 
table. Give a 95% interval to predict Rachel’s aver- 


age calorie consumption at lunch. TABTIME 


Exercises 26.41 to 26.45 ask practical questions involving regres- 
sion inference without step-by-step instructions. Do complete re- 
gression analyses, using the Plan, Solve, and Conclude steps of the 
four-step process to organize your answers. Follow the model of 
Example 26.9 and the subsequent discussion, and check the condi- 
tions as part of the Solve step. 


26.41 


26.42 


26.43 


26.44 


Yukon squirrels. The introduction to Exercise 4.47 
(page 125) gives data on the abundance of the pine- 
cones that red squirrels feed on and the population 
density of red squirrels over 23 years. How signifi- 
cant is the evidence that more cones lead to higher 
population density? (Use a vertical scale from —1.5 
to 1.5 in your residual plot to show the pattern more 
clearly.) 


(Optional topic) A big-toe problem. Table 7.4 (page 
197) and Exercises 7.47 and 7.49 describe the rela- 
tionship between two deformities of the feet in young 
patients. Metatarsus adductus (MA) may help pre- 
dict the severity of hallux abducto valgus (HAV). 
The paper that reports this study says, “Linear re- 
gression analysis, using the hallux abducto angle as 
the response variable, demonstrated a significant 
correlation between the metatarsus adductus and 
hallux abducto angles.”!? Do a suitable analysis 
to verify this finding. The study authors note that 
the scatterplot suggests that the variation in y may 
change as x changes, so they offer a more elaborate 
analysis as well. BIGTOE 


Beavers and beetles. Exercise 5.59 (page 159) de- 
scribes a study that found that the number of stumps 
from trees felled by beavers predicts the abundance of 
beetle larvae. Is there good evidence that more bee- 
tle larvae clusters are present when beavers have left 
more tree stumps? Estimate how many more clusters 
accompany each additional stump, with 95% confi- 


dence. BEAVERS 


Sulfur, the ocean, and the sun. Sulfur in the 
atmosphere affects climate by influencing formation 
of clouds. The main natural source of sulfur is dimeth- 
ylsulfide (DMS) produced by small organisms in the 
upper layers of the oceans. DMS production is in turn 
influenced by the amount of energy the upper ocean 
receives from sunlight. Exercise 4.30 (page 122) gives 
monthly data on solar radiation dose (SRD, in watts 
per square meter) and surface DMS concentration (in 
nanomolars) for a region in the Mediterranean. Do the 
data provide convincing evidence that DMS increases 
as SRD increases? We also want to estimate the rate 
of increase, with 90% confidence. £jjy,| suLFuR 
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26.45 


26.46 
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standardized 
residuals 
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DNA on the ocean floor. Another conclusion of 
the study introduced in Exercise 26.39 was that or- 
ganic matter settling down from the top layers of the 
ocean is the main source of DNA on the seafloor. 
An important piece of evidence is the relationship 
between DNA and phytopigments. Do the data give 
good reason to think that phytopigment concentra- 
tion helps explain DNA concentration? (Try vertical 
limits —1 to 1 to make the pattern of your residual 
plot clearer.) yy)) ona 


(Optional topic) A lurking variable. Return to 
the data on selling price versus appraised value for 
beachfront condominiums that are the basis for the 
Check Your Skills Exercises 26.16 to 26.24. The data 
are in order by date of the sale, and the data table 
includes the number of months from the start of the 
data period. Here are the residuals from the regres- 
sion of selling price on appraised value (rounded) 
ordered from left to right: £jj)) conpREs 
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standardized residuals have the standard z-score 
scale, it is easier to judge whether any are extreme. 
Figure 26.13 and the associated data include the 
standardized residuals for the regression of gate ve- 
locity on piston wall thickness. 


(a) Find the mean and standard deviation of the 
standardized residuals. Why do you expect values 
close to those you obtain? 

(b) Make a stemplot of the standardized residuals. 
Are there any striking deviations from Normality? 
(c) The most extreme standardized residual is 

zy = —2.21. Minitab flags this as “large.” What is 
the probability that a standard Normal variable 
takes a value this extreme (that is, less than —2.21 
or greater than 2.21)? Your result suggests that a 
residual this extreme would be a bit unusual when 
there are only 12 observations. That’s why we 
examined Observation 9 in Exercise 26.28. 


(Optional topic) Tests for the intercept. Figure 26.7 
gives Minitab output for the regression of 


—117.36 —162.48 —178.04 —51.17 —152.00 185.56 —35.93 blood alcohol content (BAC) on number of 
146.25 558.43 229.38 333215 —105.04 S226 21S oi beers consumed. The t test for the hypoth- 
197.20 312.84 599.82 —212.40 —103.89 279.13 —152.20 esis that the population regression line has 
(233 SOI 254 O05 81 OU S= 20:9 99:23 98:10 slope B = 0 has P < 0.001. The data show 

6.01 73.10 —173.33 —95.12 —145.05 298.23 —164.86 a positive linear relationship between BAC 
Neiesy? 127.82 13.35 —240.66 88.59 16.46 


(a) Plot the residuals against the explanatory vari- 
able (appraised value). To make the pattern clearer, 
use vertical limits —600 to 600. Does the pattern 
you see agree with the conditions of linear relation- 
ship and constant standard deviation needed for 
regression inference? 

(b) Make a stemplot of the residuals. Are there 
strong deviations from Normality that would pre- 
vent regression inference? 

(c) Next, plot the residuals against month. Are the 
positive and negative residuals randomly scattered, 
as would be the case if the conditions for regression 
inference are satisfied? (Comment: Suppose prices 
for beachfront property rise rapidly during any 
period. Because property is reassessed just once a 
year, selling prices might pull away from appraised 
values over time in the period, creating a pattern of 
many negative residuals followed by several positive 
residuals. As this example illustrates, it is often wise 
to plot residuals against important lurking variables 
as well as against the explanatory variable.) 
(Optional topic) Standardized residuals. Software 
often calculates standardized residuals as well as 
the actual residuals from regression. Because the 
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and beers. We might expect the intercept a of 

the population regression line to be 0, because 
no beers (x = 0) should produce no alcohol in the 
blood (y = 0). To test 


Hyp: a = 0 
H,; a #0 


we use a t statistic formed by dividing the least- 
squares intercept a by its standard error SE,. Locate 
this statistic in the output of Figure 26.7 and verify 
that it is in fact a divided by its standard error. 
What is the P-value? Do the data suggest that the 
intercept is not zero? 


(Optional topic) Confidence intervals for the 
intercept. The output in Figure 26.7 allows you to 
calculate confidence intervals for both the slope B 
and the intercept a of the population regression line 
of BAC on beers in the population of all students. 
Confidence intervals for the intercept a have the 
familiar form a + t*SE, with degrees of freedom 
n — 2. What is the 95% confidence interval for the 
intercept? Does it contain zero, the value we might 
guess for a? 
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Exploring the Web 


Predicting batting averages. As you did in Exercise 5.66 (page 161), go to 
www.mlb.com/ and find the batting averages for a diverse set of 30 players 
for both the 2012 and 2013 seasons. You can click on the “Stats” tab to find the 
results for the current season as well as historical data. You should select only 
players who played in at least 50 games both seasons. Find the least-squares re- 
gression line for predicting batting average in 2013 from that in 2012 based on 
your sample of 30 players. In 2012, the major league leader in batting was Buster 
Posey, who had a batting average of 0.336. Find a 95% prediction interval for the 
2013 batting average of someone who hit 0.336 in 2012. How does this prediction 
compare with Buster Posey’s 2013 batting average? 


Olympic medal counts. In Exercise 4.50 (page 126) you made a scatterplot 
of the Winter Olympics medal counts for 2010 and 2014. We investigate these 
medal counts further. Go to the Chance News website at www. causeweb.org/ 
wiki/chance/index.php/Chance News 61#Predicting medal _ 
counts and read the article “Predicting Medal Counts.” Next, search the web (as 
you did in Chapter 4) and locate the Winter Olympics medal counts for 2010 and 
2014 (I found Winter Olympics medal counts on Wikipedia). Find the equation 
of the least-squares regression line for predicting the 2014 medal counts from the 
2010 counts. Compute 95% confidence intervals for the slope and intercept of your 
regression line. Are your results consistent with the comment in the Chance News 
article that states “we would have done well simply predicting that the Vancouver 
totals would match the Torino totals”? 


Exploring the Web 
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In this chapter 


One-Way Analysis ve ever 


27.1 Comparing several means 


oO  aeiee one oS 


Using technology 
27.4 The idea of analysis of 


Several Means ies 


27.5 Conditions for ANOVA 


27.6 F distributions and degrees 
of freedom 


27.7 Some details of ANOVA* 
he two-sample t procedures of Chapter 21 compare the means of two popula- 
tions or the mean responses to two treatments in an experiment. Of course, 
studies don’t always compare just two groups. We need a method for compar- 
ing any number of means. 


EXAMPLE 27.1 Comparing Tropical Flowers 


the different species of hummingbirds that fertilize the flowers.’ Over time, the 
researchers believe, the lengths of the flowers and the forms of the hummingbirds’ 
beaks have evolved to match each other. If that is true, flower varieties fertilized by 
different hummingbird species should have distinct distributions of length. 

Table 27.1 gives length measurements (in millimeters) for samples of three variet- 
ies of Heliconia, each fertilized by a different species of hummingbird. Do the three 
varieties display distinct distributions of length? In particular, are the mean lengths 
of their flowers different? 


Ethan Temeles and W. John Kress of Amherst College studied the relation- 
ship between varieties of the tropical flower Heliconia on the island of Dominica and 
FLOWER 
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H. BIHAI 


47.12 46.75 46.81 47.12 46.67 47.43 46.44 46.64 
48.07 48.34 48.15 50.26 50.12 46.34 46.94 48.36 


a> 
€ 
3 
x 
oe 
= 
a 
£ 
3 
Z 
12) 


H. CARIBAEA RED 
41.90 42.01 41.93 43.09 41.47 41.69 39.78 40.57 
39.63 42.18 40.66 37.87 39.16 37.40 38.20 38.07 
38.10 37.97 38.79 38.23 38.87 37.78 38.01 


H. CARIBAEA YELLOW 
36.78 37.02 36.52 36.11 36.03 35.45 38.13 37.10 
35.17 36.82 36.66 35.68 36.03 34.57 34.63 


[ ; Use graphs and numerical descriptions to describe and compare the three 
distributions of flower length. Finally, ask whether the differences among the mean 
lengths of the three varieties are statistically significant. 


Boe GRGTeaeo: Figure 27.1 displays side-by-side stemplots with the stems 


lined up for easy comparison. The lengths have been rounded to the nearest tenth of 
a millimeter. Here are the summary measures we will use in further analysis: 


Sample Variety Sample Size Mean Length Standard Deviation 
1 H. bihai 16 47.60 1.213 
2 H. caribaea red 23 39.71 1.799 
3 H. caribaea yellow 15 36.18 0.975 


(EONS The three varieties differ so much in flower length that 


there is little overlap among them. In particular, the flowers of H. bihai are longer 
than either H. caribaea red or H. caribaea yellow. The mean lengths are 47.6 mm for 
H. bihai, 39.7 mm for H. caribaea red, and 36.2 mm for H. caribaea yellow. Are these 
observed differences in sample means statistically significant? We must develop a 
test for comparing more than two population means. m 


H. bihai H. caribaea red H. caribaea yellow 
34 34 34166 
35 35 $5. | 2.5:7 
36 36 36 |0015788 
37 37 | 489 37 | 01 
38 38 |00112289 38] 1 
39 39 | 268 39 
40 40 | 67 40 
41 41|5799 41 
42 42 |02 42 
43 435 | 1 43 
44 44 44 
45 45 45 
FIGURE 27.1 46 |3467889 46 46 
Side-by-side stemplots comparing the 47) 114 47 47 
lengths in millimeters of samples of 48 |1234 48 48 
flowers from three varieties of Heliconia, 49 49 49 
from Table 27.1. 50/13 50 50 


CHAPTER 27 ® One-Way Analysis of Variance: Comparing Several Means 


(a) (0) 
Distributions Variety = bihai 
Length 
ne 50.0 
0.50 
0.40 
0.30 475 
0.20 
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Tp aT TTT i Ts T aw 
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= 
Distributions Variety = red = 425 
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0.50 = 40.0 
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0.30 = 
0.20 
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35.0 
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Length bihai red 
Variety 
0.50 
0.40 
0.30 
0.20 
0.10 
Ln LaLa FIGURE 27.2 


33 35 37 39 41 43 45 47 49 51 


from three varieties of Heliconia, from Table 27.1. 


In the Solve step, we have chosen to draw side-by-side stemplots to compare the 
three distributions. The three distributions can also be compared using histograms 
or boxplots. Figure 27.2(a) provides comparative histograms and Figure 27.2(b) 
comparative boxplots. Because we want to compare the size of the responses for 
the three distributions as well as examine the general shape, it is best to align 
the response axis for the three graphs for ease of comparison. If the histograms 
in Figure 27.2(a) were side by side rather than above each other as in the figure, 
comparisons of the lengths for the three varieties would be more difficult. For com- 
parison purposes, it is also best to use the percent scale for the histograms when the 
sample sizes differ. Figure 27.2(b) shows the comparative boxplots. Our impression 
of the distributions is similar for each of the three graphical displays. The flowers 
of H. bihai are longer than either red or yellow, and the variability for the lengths 
of the red flowers is somewhat larger than for the other two varieties. The choice 
of an appropriate graph is based on your software capabilities and, to some extent, 
on personal preference. Although boxplots show less detail, they are useful for com- 
parison as long as the sample sizes are not very small. 


Call the mean lengths for the three populations of flowers jw; for H. bihai, wy 
for H. caribaea red, and py; for H. caribaea yellow. The subscript reminds us which 
group a parameter or statistic describes. To compare these three population means, 
we might use the two-sample t test several times: 


« Test Ho: @; = M2 to see if the mean length for H. bihai differs from the mean 
for H. caribaea red. 


yellow 


Histograms (a) and boxplots (b) comparing the lengths in millimeters of samples of flowers 
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multiple comparisons 


analysis of variance F test 


e Test Ho: w; = ps to see if H. bihai differs from H. caribaea yellow. 
ew Test Ho: 2. = ps to see if H. caribaea red differs from H. caribaea yellow. 


The weakness of doing three tests is that we get three P-values, one for each test 
alone. That doesn’t tell us how likely it is that three sample means are as different 
from each other as these are. It may be that x, = 47.60 and x; = 36.18 are sig- 
nificantly different if we look at just two groups, but not significantly different if we 
know that they are the largest and the smallest means in three groups. As we look 
at more groups, we expect the gap between the largest and smallest sample mean 

to get larger. (Think of comparing the tallest and shortest person in larger 
A and larger groups of people.) We can't safely compare many parameters by doing 
tests or confidence intervals for two parameters at a time. 

The problem of how to do many comparisons at once with an overall measure 
of confidence in all our conclusions is common in statistics. This is the problem of 
multiple comparisons. Statistical methods for dealing with multiple comparisons 
usually have two steps: 


1. An overall test to see if there is good evidence of any differences among the 
parameters that we want to compare, and 


2. A detailed follow-up analysis to decide which of the parameters differ and to 
estimate how large the differences are. 


The overall test, though more complex than the tests we have met to this point, 
is reasonably straightforward. Formal follow-up analysis can be quite elaborate. We 
will concentrate on the overall test and use data analysis to describe in detail the 
nature of the differences. Companion Chapter 30, available online, presents some 
details of follow-up inference. 


() LaunchPad Online Resources 


* The StatBoards video, Inference for More than Two Sample Means, 
provides further discussion of comparing more than two means and multiple 
comparisons. 


We want to test the null hypothesis that there are no differences among the mean 
lengths for the three populations of flowers: 


Ap: fi = fa = Bs 
The basic conditions for inference (more detail later) are that we have random samples 
from the three populations and that flower lengths are Normally distributed in each 
population. 
The alternative hypothesis is that there is some difference. That is, not all three 
population means are equal: 


H,: not all of 4, 2, and 3 are equal 


The alternative hypothesis is no longer one-sided or two-sided. It is “many-sided” 
because it allows any relationship other than “all three equal.” For example, H, 
includes the case in which , = m3 but pw, has a different value. The test of Ho 
against H, is called the analysis of variance F test. Analysis of variance is usually 
abbreviated as ANOVA. The ANOVA F test is almost always carried out by soft- 
ware that reports the test statistic and its P-value. 


27.2 The Analysis of Variance F Test | 649 


BOSE Mioicueoe Software tells us that, for the flower length data in Table 27.1, 
the test statistic is F = 259.12 with P-value P < 0.0001. There is very strong 42 
evidence that the three varieties of flowers do not all have the same mean length. 
The F test does not say which of the three means are significantly different. It 
appears from our preliminary data analysis that H. bihai flowers are distinctly longer 
than either H. caribaea red or H. caribaea yellow. H. caribaea red and H. caribaea yel- 
low are closer together, but the H. caribaea red flowers tend to be longer. 


[ESOMSRUDA There is strong evidence (P < 0.0001) that the population means 
are not all equal. The most important difference among the means is that the H. bihai 
variety has longer flowers than the H. caribaea red and yellow varieties. m= 


Example 27.2 illustrates our approach to comparing means. The ANOVA F test 
(done by software) assesses the evidence for some difference among the population 
means. Formal follow-up analysis would allow us to say which means differ and 
by how much, with (say) 95% confidence that all our conclusions are correct. We 
rely instead on examination of the data to show what differences are present and 
whether they are large enough to be interesting. 


Apply Your Knowledge 


27.1 Road Rage. “The phenomenon of road rage has been frequently dis- 
cussed but infrequently examined.” So begins a report based on interviews 
with 1382 randomly selected drivers.” The respondents’ answers to interview 
questions produced scores on an “angry/threatening driving scale” with val- 
ues between 0 and 19. What driver characteristics go with road rage? There 
were no significant differences among races or levels of education. What 
about the effect of the driver’s age? Here are the mean responses for three 
age groups: 


<30 yr 30-55 yr >55 yr 
22D iNe3}3} 0.66 


The report says that F = 34.96, with P < 0.01. 


(a) What are the null and alternative hypotheses for the ANOVA F test? 
Be sure to explain what means the test compares. 


(b) Based on the sample means and the F test, what do you conclude? 


27.2 Angry Women, Sad Men. What are the relationships among the portrayal 
of anger or sadness, sex, and status conferred? Sixty-eight subjects were ran- 
domly assigned to view a videotaped job interview in which either a male or 
a female professional described feeling either anger or sadness. The targets 
being interviewed wore professional attire and were ostensibly being inter- 
viewed for a job while sitting at a table. The targets described an incident 
in which they and a colleague lost an account, and when asked by the inter- 
viewer how it made them feel, responded that the incident made them feel 
either angry or sad. The subjects were divided into four groups and evaluated 
one of the following four types of interviews:? 


Male Target Female Target 
Expressed anger Group A Group C 


Expressed sadness Group B Group D 
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FIGURE 27.3 

Bar graph comparing the mean status 
conferred for the four types of targets, 
for Exercise 27.2. 


FIGURE 27.4 

ANOVA for the flower length data: output 
from a graphing calculator, three statistical 
programs, and a spreadsheet program. 


After watching the interview, subjects evaluated the target on a composite 
measure of status conferral which included items assessing how much status, 
power, and independence the target deserved in his or her future job. The 
measure of status ranged from | = none to 11 = a great deal. 


(a) What are the null and alternative hypotheses for the ANOVA F test? 
Be sure to explain what means the test compares. 


(b) Figure 27.3 is a bar graph displaying the means for the four groups. 
What is the approximate size of the mean difference in status con- 
ferred on angry men versus angry women? Which of these two groups 
has the higher mean status? 
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Any technology used for statistics should perform analysis of variance. Figure 27.4 
displays ANOVA output for the data of Table 27.1 from a graphing calculator, three 
statistical programs, and a spreadsheet program. 

Minitab, JMP, and Excel give the sizes of the three samples and their means. 
These agree with those in Example 27.1. Minitab also gives the standard deviations, 
Excel gives the variances, and JMP gives the standard errors. All five outputs report 
the F test statistic, F = 259.12, and its P-value. Minitab reports the P-value as zero 
to three decimal places, while CrunchIt! and JMP report that P < 0.0001. This is 
all we need to know about the P-value in practice. Excel and the graphing calcula- 
tor offer the specific value 1.92 X 10~?’. (This would be correct if the population 
distributions were exactly Normal. In practice, read such values simply as “P is very 
small.”) There is very strong evidence that the three varieties of flowers do not all 
have the same mean length. 


Texas Instruments Graphing Calculator 


oO 
17 FS 


One-way ANOVA 
t M5=541.436183 
Error 


df=31 

55=1606.565761 

MS=2, 68952472 
SxP=1.44551384 


= 
a 
d 
S 
M 
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27.3 Using Technology 


Minitab FIGURE 27.4 
f@ Session (Continued). 
One-way ANOVA: length versus variety q 
Source DF SS Ms F P 


Variety 2 1082.87 541.44 259.12 0.000 
Error 52! 106.57 2.09 
Total 53) 1189.44 


S = 1.446 R-Sq = 91.04% R-Sq(adj) = 90.69% 


Individual 95% CIs For Mean Based on 
Pooled StDev 


Level N Mean. (StDevs sss 55 5 > aS ae a ad i ese barge a og + 
Bihai 16 47.598 1.213 (oe) 
Red 230 93:97 11 A799, (#2) 
Yellow 15 36.180 0.975 (-*--) 
--s555-- Glia etal alate RE Salta a 
38.5 42.0 45.5 49.0 
Pooled StDev = 1.446 
Crunchlt! 
Results - One-Way ANOVA [aml >¢) 
Export + | 


Variety] 1083 541.4 A591 <0.0001 


ea fos fof [| | 


A Analysis of Variance 


2 1082.8724 541.436 259.1193 <.0001* 
sos Re 0] eo DoT) 2.090 
Se} ae) chet 


| Means for Oneway Anova 


16 47.5975 0.36138 46.872 48.323 
23 Boo Tals) (0) .eioalesat 39.106 40.316 
iS) 36.1800 0.37323 35.431 E165 22) 


Std Error uses a pooled estimate of error variance 


Excel 
3 Microsoft Excel - ta25-01.dat 


| 


Anova: ‘Single Factor | 
a 
Groups 
761.56 47.5975 


913.36 39.7113 
542.7 36.18 


Variance 
1.471073 
3.235548 
0.951257 


Average 
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HTL ee 
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: 


F crit 
3.178799 


P-value 
1.92E-27 


im 
e of variation Ms 
[Between Groups 1082.872 541.4362 
[Within Groups 106.5658 2.089525 


259.1193 
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All five outputs report degrees of freedom (df), sums of squares (SS), and mean 
squares (MS). We don’t need this information now. Minitab and JMP also give 
confidence intervals for all three means that help us see which means differ and 
by how much. None of the intervals overlap, and H. bihai is much above the other 
two. These are 95% confidence intervals for each mean separately. We are not 95% 
confident that all three intervals cover the three means. This is another example of 
the peril of multiple comparisons. 


Apply Your Knowledge 


27.3 Logging in the Rain Forest. How does logging in a tropical rain forest 
affect the forest in later years? Researchers compared forest plots in Borneo 
that had never been logged (Group 1) with similar plots nearby that had been 
logged one year earlier (Group 2) and eight years earlier (Group 3). Although 
the study was not an experiment, the authors explained why we can consider 
the plots to be randomly selected. The data appear in Table 27.2. The vari- 
able Trees is the count of trees in a plot; Species is the count of tree species in 
a plot. The variable Richness is Species/Trees, the number of species divided 
by the number of individual trees.* Ely) LOGFULL 


GROUP TREES SPECIES RICHNESS | GROUP TREES SPECIES RICHNESS 
1 27 22 0.81481 2 18 15 0.83333 
1 yy) 18 0.81818 2 17 15 0.88235 
1 29 22 0.75862 2 14 12 0.85714 
1 Dil 20 0.95238 2 14 13 0.92857 
1 19 15 0.78947 2 2 2 1.00000 
1 33 Ml 0.63636 z 17 15 0.88235 
1 16 13 0.81250 Zz 19 8 0.42105 
1 20 13 0.65000 3 18 17 0.94444 
1 24 19 0.79167 3 4 4 1.00000 
1 27 13 0.48148 B) 22 18 0.81818 
1 28 19 0.67857 3 15 14 0.93333 
1 19 15 0.78947 ) 18 18 1.00000 
2 12 11 0.91667 3 19 15 0.78947 
2 12 11 0.91667 3 22 15 0.68182 
2 15 14 0.93333 3 12 10 0.83333 
2 9 7 0.77778 3 12 12 1.00000 
z 20 18 0.90000 


(a) Make an appropriate comparative graph of the variable Trees for the 
three groups. What effects of logging are visible? 

(b) Figure 27.5 shows Excel ANOVA output for Trees. What do the group 
means show about the effects of logging? 

(c) What are the values of the ANOVA F statistic and its P-value? What 
hypotheses does F test? What conclusions about the effects of logging 
on number of trees do the data lead to? 
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FIGURE 27.5 
Excel output for analysis of variance on the number of trees in forest plots, for Exercise 27.3. 


27.4 Political Views and Age. The University of Chicago’s General Social 
Survey (GSS) is the nation’s most important social science sample survey. 
The GSS asked a random sample of adults in 2012 their age and where they 
placed themselves on the political spectrum from extremely liberal to ex- 
tremely conservative. The categories in the original survey included slightly 
liberal, liberal, and extremely liberal, but these have been combined into the 
single category liberal, and similarly with conservative.’ 


(a) Figure 27.6 gives the Minitab ANOVA output for these data. What do 
the mean ages say about the relationship between age and political views? 


(b) What are the values of the ANOVA F statistic and its P-value? What 
hypotheses does F test? Briefly describe the conclusions you draw from 


these data. 
f@ Session = Vi 
~ 
One-way ANOVA: Politics vs Age | 
Source DF ss MS F P 
Factor 2 7319 3659 11.69 0.000 
Error Leva Se 5597 chee} 
Total 1873 592910 
S$ =17.69 R-Sq = 1.23% R-Sq(adj) = 1.13% 


Individual 95% CIs For Mean Based on 
Pooled StDev 


Level N Mean StDev Seehessssass= qe Sete Roem $ocfece--- +----- 
Conservative 628 51.09 17.42 (----- *------ ) 
Moderate Tis) 5 465.98 7 sks (------ *----- ) 
Liberal 533 46.80 17.41 (------ *------H ) 
---t--------- +--------- +--------- +----- 
46.0 48.0 50.210) 525.0 


Pooled StDev = 17.69 


The details of ANOVA are a bit daunting (they appear in the optional Section 27.7 
at the end of this chapter). The main idea of ANOVA is more accessible and much 
more important. Here it is: when we ask if a set of sample means gives evidence for 
differences among the population means, what matters is not how far apart the sample 
means are but how far apart they are relative to the variability of individual observations. 


~ 


FIGURE 27.6 

Minitab output for the data on 
respondents’ ages for three different 
political viewpoints, for Exercise 27.4. 
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FIGURE 27.7 

Boxplots for two sets of three samples 
each. The sample means are the 

same in parts (a) and (b). Analysis 

of variance will find a more significant 
difference among the means in part (b) 
because there is less variation among 
the individuals within those samples. 


TH 5 


Look at the two sets of boxplots in Figure 27.7. For simplicity, these distributions 
are all symmetric, so that the mean and median are the same. The center line in 
each boxplot is therefore the sample mean. Both sets of boxplots compare three 
samples with the same three means. Could differences this large easily arise just due 
to chance, or are they statistically significant? 


« The boxplots in Figure 27.7(a) have tall boxes, which show lots of variation 
among the individuals in each group. With this much variation among 
individuals, we would not be surprised if another set of samples gave quite 
different sample means. The observed differences among the sample means 
could easily happen just by chance. 


« The boxplots in Figure 27.7(b) have the same centers as those in Figure 27.7(a), 
but the boxes are much shorter. That is, there is much less variation among 
the individuals in each group. It is unlikely that any sample from the first 
group would have a mean as small as the mean of the second group. Because 
means as far apart as those observed would rarely arise just by chance in 
repeated sampling, they are good evidence of real differences among the 
means of the three populations we are sampling from. 


You can use the One-Way ANOVA applet to demonstrate the analysis of vari- 
ance idea for yourself. The applet allows you to change both the group means and 
the variability within groups. You can watch the ANOVA F statistic and its P-value 
change as you work. 

This comparison of the two parts of Figure 27.7 is too simple in one way. It ignores 
the effect of the sample sizes, an effect that boxplots do not show. Small differences 
among sample means can be significant if the samples are large. Large differences among 

sample means can fail to be significant if the samples are small. All we can 
A be sure of is that for the same sample size, Figure 27.7(b) will give a much 

smaller P-value than Figure 27.7(a). Despite this qualification, the big idea 
remains: if sample means are far apart relative to the variation among individuals in 
the same groups, that’s evidence that something other than chance is at work. 


The Analysis of Variance Idea 


Analysis of variance compares the variation due to specific sources with the varia- 
tion among individuals who should be similar. In particular, ANOVA tests whether 
several populations have the same mean by comparing how far apart the sample 
means are with how much variation there is within the samples. 


It is one of the oddities of statistical language that methods for comparing means 
are named after the variance. The reason is that the test works by comparing two 
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kinds of variation. Analysis of variance is a general method for studying sources of 
variation in responses. Comparing several means is the simplest form of ANOVA, 
called one-way ANOVA. one-way ANOVA 


The ANOVA F Statistic 


The analysis of variance F statistic for testing the equality of several means has 
this form: 


variation among the sample means 


variation among individuals in the same sample 


If you want more detail, read the optional material in Section 27.7 at the end of this 
chapter. The F statistic can take only values that are zero or positive. It is zero only when 
all the sample means are identical and gets larger as they move farther apart. Large 
values of F are evidence against the null hypothesis Hp that all population means are 
the same. Although the alternative hypothesis H, is many-sided, the ANOVA F test is 
one-sided because any violation of Hy tends to produce a large value of F. 


() LaunchPad Online Resources 


* The StatClips video, ANOVA-Background and Introduction, and the 
Snapshots video, Introduction to ANOVA, both discuss the rationale for 
ANOVA and the hypotheses being tested with the ANOVA F statistic. 


* The StatClips Examples video, ANOVA-Background and Introduction 
Example A, provides an additional example of the idea of ANOVA. 


Apply Your Knowledge 


27.5 ANOVA Compares Several Means. The One-Way ANOVA applet displays 
the observations in three groups, with the group means highlighted by black dots. 


(a) What are the F and P value for these three samples? (The P-value is marked 
by a red dot that will move along the scale as you modify the samples.) 


(b) Which group has the largest mean? If you grab its mean point with 
the mouse and begin to move it in either direction, the F value will 
change. How small can you make F? What did you do to this mean to 
make F small? Roughly how significant is your small F? 


(c) Starting with your configuration at the end of part (b), drag any one 
of the group means either up or down as far as they will go. What hap- 
pens to F? What happens to the P-value? How large can you make F? 
What did you do to the three means to make F large? 


27.6 ANOVA Uses Within-Group Variation. If the One-Way ANOVA applet is 
open from Exercise 27.5, click on New Samples. Otherwise, open the One- eS 
Way ANOVA applet which displays the observations in three groups, with 
the group means highlighted by black dots. You are going to use the Applet 
to investigate the effects described in Figure 27.7 (page 654). 


(a) Use the mouse to slide the Standard Deviation at the top of the display 
to the right. You see that the group means do not change, but the vari- 
ability of the observations in each group increases. What happens to F 
and P as the variability among the observations in each group increases? 
What are the values of F and P when the slider is all the way to the 
right? This is similar to Figure 27.7 (a): variation within groups hides 
the differences among the group means. 
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iny WE WEREN’T 
LD WORKING ANYWAY 


A “consultant” estimated that the 
annual NCAA men’s basketball 
tournament costs employers 

$3.8 billion in time wasted by workers 
participating in office pools, checking 
game scores, and so on. That’s 
unlikely. Most of the games are 
played outside work hours, at night 
and on weekends. More to the point, 
economists note that workers waste 
lots of time every workday by talking 
to other employees, chatting on the 
phone, shopping online, and so on. 
The dollar value of time spent on 
the basketball tournament probably 
comes in large part from time we 
were wasting anyway. 


robustness 
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(b) Leave the Standard Deviation slider at the extreme right of its scale, so 
that variability within groups stays fixed. Use the mouse to move the 
group means apart. What happens to F and P as you do this? 


Like all inference procedures, ANOVA is valid only in some circumstances. Here 
are the conditions under which we can use ANOVA to compare population means. 


Conditions for ANOVA Inference 


» We have | independent SRSs, one from each of I populations. We measure the 
same quantitative response variable for each sample. 


© The ith population has a Normal distribution with unknown mean p;. One-way 
ANOVA tests the null hypothesis that all the population means are the same. 


= All the populations have the same standard deviation o, whose value is unknown. 


The first two conditions are familiar from our study of the two-sample t proce- 
dures for comparing two means. As usual, the design of the data production is the 
most important condition for inference. Biased sampling or confounding can make 

any inference meaningless. If we do not actually draw separate SRSs from 
A each population or carry out a randomized comparative experiment, it may be 

unclear to what population the conclusions of inference apply. ANOVA, like 
other inference procedures, is often used when random samples are not available. 
You must judge each use on its merits, a judgment that usually requires some knowl- 
edge of the subject of the study in addition to some knowledge of statistics. 

Because no real population has exactly a Normal distribution, the usefulness 
of inference procedures that assume Normality depends on how sensitive they are 
to departures from Normality. Fortunately, procedures for comparing means are 
not very sensitive to lack of Normality. The ANOVA F test, like the t procedures, 
shows robustness. What matters is Normality of the sample means, so ANOVA 
becomes safer as the sample sizes get larger because of the central limit theorem 
effect. Remember to check for outliers that change the value of sample means and 
for extreme skewness. When there are no outliers and the distributions are roughly 
symmetric, you can safely use ANOVA for sample sizes as small as 4 or 5. 

The third condition is annoying: ANOVA assumes that the variability of obser- 
vations, measured by the standard deviation, is the same in all populations. The t 
test for comparing two means (Chapter 21) does not require equal standard devia- 
tions. Unfortunately, the ANOVA F for comparing more than two means is less 
broadly valid. It is not easy to check the condition that the populations have equal 
standard deviations. Statistical tests for equality of standard deviations are very 
sensitive to lack of Normality, so much so that they are of little practical value. You 
must either seek expert advice or rely on the robustness of ANOVA. 

How serious are unequal standard deviations? ANOVA is not too sensitive to viola- 
tions of the condition, especially when all samples have the same or similar sizes and 
no sample is very small. When designing a study, try to take samples of about the same 
size from all the groups you want to compare. The sample standard deviations estimate 
the population standard deviations, so check before doing ANOVA that the sample 
standard deviations are similar to each other. We expect some variation among them 
due to chance. Here is a rule of thumb that is safe in almost all situations. 
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Checking Standard Deviations in ANOVA 


The results of the ANOVA F test are approximately correct when the largest sample 
standard deviation is no more than twice as large as the smallest sample standard 
deviation. 


The study of Heliconia blossoms is based on three independent samples that the 
researchers consider to be random samples from all flowers of these varieties in 
Dominica. The stemplots in Figure 27.1 show that the H. bihai and H. caribaea red 
varieties have slightly skewed distributions, but the sample means of samples of 
sizes 16 and 23 will have distributions that are close to Normal. The sample standard 
deviations for the three varieties are 

Ss; = 1.213 s,=1.799 s;=0975 
These standard deviations satisfy our rule of thumb: 

largests 1.799 | 

smallests 0.975 
We can safely use ANOVA to compare the mean lengths for the three populations. m 


1.85 (less than 2) 


(STATE® Kathleen Vohs of the University of Minnesota and her co-workers carried out 
several randomized comparative experiments on the effects of thinking about money. 
Here’s an outline of one of the experiments. Ask student subjects to unscramble 30 
sets of five words to make a meaningful phrase from four of the five. The control 
group unscrambled phrases like “cold it desk outside is” into “it is cold outside.” The 
“play money” group unscrambled similar sets of words, but a stack of Monopoly 
money was placed nearby. The “money prime” group unscrambled phrases that led 
to thinking about money, turning “high a salary desk paying” into “a high-paying sal- 
ary.” Then each subject worked a hard puzzle, knowing that they could ask for help. 
Table 27.3 shows the time in seconds that each subject worked on the puzzle before 
asking for help.© Psychologists think that money tends to make people self-sufficient. 
If so, the two groups that were encouraged in different ways to think about money 
should take longer on the average to ask for help. Do the data support this idea? 


-PLAN: Examine the data to compare the effect of the treatments and check that we 
can safely use ANOVA. If the data allow ANOVA, assess the significance of observed 
differences in mean times to ask for help. 


EEQRVAI Figure 27.8 compares the histograms of the data in the three groups. We 
expect some irregularity in small samples, but there are no outliers or strong skew- 
ness that would hinder use of ANOVA. The Minitab ANOVA output in Figure 27.9 
shows that the group standard deviations easily satisfy our rule of thumb. The control 
group subjects asked for help much sooner (mean 186.1 seconds) than did subjects 
in the two money groups (means 305.2 seconds and 314.1 seconds). The three 
means are significantly different (F = 3.73, P = 0.031). 


QS eMEA The experiment gives good evidence that reminding people of money 
in either of two ways does make them less willing to ask others for help. This is con- 
sistent with the idea that money makes people feel more self-sufficient. m 
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Time 


GROUP TIME GROUP TIME GROUP TIME 
ee Prime 609 Play 455 Conteal 118 
Ao Prime 444 Play 100 Control 272 
NR AKA URGEOR UDG Prime 242 Play 238 Control 413 
Prime 199 Play 243 Control 291 
Distributions Group = play Prime 174 Play 500 Control 140 
Time Prime 55 Play 570 Control 104 
aaa Prime 251 Play 231 Control 55 
Hee Prime 466 Play 380 Control 189 
0.10 Prime 443 Play 222 Control 126 
0 100 200 300 400 500 600 700 Prime Sel Play Ful Control 400 
Prime 135 Play 232 Control 92 
Distributions Group = prime Prime 241 Play 219 Control 64 
we Prime 476 Play 320 Control 88 
0.40 Prime 482 Play 261 Control 142 
ae Prime 362 Play 290 Control 141 
Ose Prime 69 Play 495 Control 373 
OO) Prime 160 Play 600 Control 156 

Play 67 


FIGURE 27.8 

Histograms comparing the time until 
subjects asked for help with a puzzle, for 
Example 27.4. 


One-way ANOVA: time versus group 


Source DF Ss MS F P 
Group é| L7AQID 87456 3273: 10:.031 
Error 49 1149568 23461 

Total 51 1234479 


S = 153.2 R-Sq = 13.21% R-Sq(adj) = 9.66% 


Individual 95% CIs For Mean Based on 
Pooled StDev 


Level N Mean@ SCD CVS See iem ta mnee sg anal Se ee pee ce BF nm oe BP ap cae 
OT Er Oll eel /aumeel 9 Gee 118 yO (aan ean nas ) 
play 18 305.2 162.5 (---------- #---------- ) 
prime De Are) PLD: <8 (Sa oe a ae Og ) 
Ft Ba a So oat eo 
FIGUBE 27.9 : 140 210 280 350 
Minitab ANOVA output for comparing Pooled StDev = 153.2 


the three treatments in Example 27.4. 


(3) LaunchPad Online Resources 


* The Snapshots video, ANOVA Inference, discusses the details of using the 
ANOVA F statistic for inference through an example and the StatBoards 
video, Checking Conditions for ANOVA, provides an additional example on 
checking the conditions for inference in ANOVA. 


Apply Your Knowledge 


27.7 Checking Standard Deviations. Verify that the sample standard deviations 
for these sets of data do allow use of ANOVA to compare the population means. 


(a) The counts of trees in Exercise 27.3 and Figure 27.5 
(b) The ages of Exercise 27.4 and Figure 27.6 


27.6 F Distributions and Degrees of Freedom 


27.8 Species Richness after Logging. Table 27.2 gives data on the species rich- 
ness in rain forest plots, defined as the number of tree species in a plot divided by 
the number of trees in the plot. ANOVA may not be trustworthy for the richness 
data. Do data analysis: make an appropriate graph to examine the distributions of 
the response variable in the three groups, and also compare the standard devia- 
tions. What characteristic of the data makes ANOVA risky? am LOGFULL 


27.9 Fertilizing Bromeliads. Bromeliads are tropical flowering plants. Many 
are epiphytes that attach to trees and obtain moisture and nutrients from air 
and rain. Their leaf bases form cups that collect water and are home to the 
larvae of many insects. Preliminary to a study of changes in the nutrient cycle, 
Jacqueline Ngai and Diane Srivastava examined the effects of adding nitrogen, 
phosphorus, or both to the cups. They randomly assigned eight bromeliads 
growing in Costa Rica to each of four treatment groups, including an unfertil- 
ized control group. A monkey destroyed one of the plants in the control group, 
leaving seven bromeliads in that group. Here are the numbers of new leaves on 
each plant over the seven months following fertilization:’ BROMLIAD 


Nitrogen Phosphorus Both Neither 


15 14 14 11 
14 14 16 13 
15 14 15 16 
16 11 14 15 
hy 13 14 15 
18 12 13 11 
ie 15 17 12 
13 15 14 


Analyze these data and discuss the results. Does nitrogen or phosphorus have 
a greater effect on the growth of bromeliads? Follow the four-step process as 
illustrated in Example 27.4. 


The ANOVA F statistic is 


variation among the sample means 


variation among individuals in the same sample 
To find the P-value for this statistic, we must know the sampling distribution of F 
when the null hypothesis (all population means equal) is true. This sampling distri- 
bution is an F distribution. 

The F distributions are a family of right-skewed distributions that take only 
values greater than zero. The density curves in Figure 27.10 illustrate their shapes. 
A specific F distribution is determined by the degrees of freedom of the numerator 
and denominator of the F statistic. You may have noticed that all our software out- 
puts include degrees of freedom, labeled either “df” or “DF.” Optional Section 27.7 
“Some details of ANOVA” shows where the degrees of freedom come from. When 
describing an F distribution, always give the numerator degrees of freedom first. Our 

brief notation will be F(df1, df2) for the F distribution with dfl degrees of 
A freedom in the numerator and df2 in the denominator. Interchanging the 
degrees of freedom changes the distribution, so the order is important. 


F distribution 
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F(9, 10) density curve F(2, 51) density curve 


0 B02) 


FIGURE 27.10 
Density curves for two F distributions. Both are right-skewed and take only positive values. The upper 5% 
critical values are marked under the curves. 


Tables of F critical points are awkward because we need a separate table for every 
pair of degrees of freedom dfl and df2. Fortunately, software gives you P-values for 
the ANOVA F test without the need for a table. 


Look again at the software output in Figure 27.4 for the flower length data. All three 
outputs give the degrees of freedom for the F test, labeled “df” or “DF” There are 
two degrees of freedom in the numerator and 51 in the denominator. P-values for the 
F test, therefore, come from the F distribution F(2, 51) with two and 51 degrees of 
freedom. The right-hand curve in Figure 25.10 is the density curve of this distribu- 
tion. The 5% critical value marked on that curve is 3.18, and the 1% critical value is 
5.05. The observed value F = 259.12 of the ANOVA F statistic lies far to the right 
of these values, so the P-value is extremely small. m 


The degrees of freedom of the ANOVA F statistic depend on the number of 
means we are comparing and the number of observations in each sample. That is, 
the F test takes into account the number of observations. Here are the details. 


Degrees of Freedom for the F Test 


We want to compare the means of I populations. We have an SRS of size n; from the 
ith population, so that the total number of observations in all samples combined is 


IN) = Wy SP Wy aR oP? ae ath 


If the null hypothesis that all population means are equal is true, the ANOVA F 
statistic has the F distribution with I — 1 degrees of freedom in the numerator and 
N — I degrees of freedom in the denominator. 


In Examples 27.1 and 27.2, we compared the mean lengths for three varieties of 
flowers, so / = 3. The three sample sizes are 


n=16 nm=23 ng=15 
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The total number of observations is therefore 
N=16+ 23+ 15= 54 
The ANOVA F test has numerator degrees of freedom 
/-1=3-1=2 
and denominator degrees of freedom 
N-/1=54-3=51 
These are the degrees of freedom given in the outputs in Figure 27.3. m 


Apply Your Knowledge 


27.10 Logging in the Rain Forest, Continued. Exercise 27.3 compares the 
number of tree species in rain forest plots that had never been logged 
(Group 1) with similar plots nearby that had been logged one year earlier 
(Group 2) and eight years earlier (Group 3). 


(a) What are I, n;, and N for these data? Identify these quantities in words 
and give their numerical values. 


(b) Find the degrees of freedom for the ANOVA F statistic. Check your 
work against the Excel output in Figure 27.5. 


27.11 What Music Will You Play? People often match their behavior to their 
social environment. One study of this idea first established that the type of 
music most preferred by black college students is R&B and that whites’ most 
preferred music is rock. Will students hosting a small group of other students 
choose music that matches the makeup of the people attending? Assign 90 
black business students at random to three equal-sized groups. Do the same 
for 96 white students. Each student sees a picture of the people he or she will 
host. Group 1 sees six blacks, Group 2 sees three whites and three blacks, 
and Group 3 sees six whites. Ask how likely the host is to play the type of 
music preferred by the other race. Use ANOVA to compare the three groups 
to see whether the racial mix of the gathering affects the choice of music.® 


(a) For the white subjects, F = 16.48. What are the degrees of freedom? 
(b) For the black subjects, F = 2.47. What are the degrees of freedom? © PhotoAlto/Alamy 


Now we will give the actual formula for the ANOVA F statistic. We have SRSs from 
each of I populations. Subscripts from 1 to I tell us to which sample a statistic refers to: 


Sample Sample Sample 

Population Size Mean Std. Dev. 
1 ny X1 $1 
y} nN? X2 $2 
I Ny X1 S] 


You can find the F statistic from just the sample sizes n;, the sample means Xx,, 
and the sample standard deviations s;. You don’t need to go back to the individual 
observations. 


*This more advanced section is optional if you are using software to find the F statistic. 
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mean squares 


mean square for groups (MSG) 


mean square for error (MSE) 


The ANOVA F statistic has the form 


variation among the sample means 


F= 


variation among individuals in the same sample 


The measures of variation in the numerator and denominator of F are called mean 
squares. A mean square is a more general form of a sample variance. An ordinary 
sample variance s’ is an average (or mean) of the squared deviations of observations 
from their mean, so it qualifies as a “mean square.” 

Call the overall mean response x. That is, x is the mean of all N observations 
together. You can find x from the I sample means by 


sum of all observations = myx; + mx) + °°: + nx, 


x= = 


N N 


(This expression works because multiplying a group mean x; by the number of 
observations n; it represents gives the sum of the observations in that group.) 

The numerator of F is a mean square that measures variation among the I sample 
means Xj, X), ... , X; Io measure this variation, look at the I deviations of the means 
of the samples from x, 


Rp ey — Rang ee 
The mean square in the numerator of F is an average of the squares of these devia- 
tions. We call it the mean square for groups (MSG): 
Wey > KP Ag ye eR ee 
I-1 
Each squared deviation is weighted by n;, the number of observations it represents. 
The mean square in the denominator of F measures variation among individual 
observations in the same sample. For any one sample, the sample variance s? does this 
job. For all I samples together, we use an average of the individual sample variances. 
It is another weighted average, in which each s? is weighted by its degrees of freedom 
n, — 1. The resulting mean square is called the mean square for error (MSE): 
(ny List es = ge ae ee Ss 1 se 
N-I 
“Error” doesn’t mean a mistake has been made. It’s a traditional term for chance 
variation. Here is a summary of the ANOVA test. 


MSG = 


MSE = 


The ANOVA F Test 


Draw an independent SRS from each of I Normal populations that have a common 
standard deviation but may have different means. The sample from the ith popula- 
tion has size n;, sample mean x;, and sample standard deviation s,. 


To test the null hypothesis that all I populations have the same mean against the alter- 
native hypothesis that not all the means are equal, calculate the ANOVA F statistic 


The numerator of F is the mean square for groups 
(Cs = BP =P aCe = RP ap ceo sey — Zo) 
t= il 
The denominator of F is the mean square for error 
Corea © Ss Aira C1 ieee ©) \y Ai a amd Cc tee Oy 
N= 
When H) is true, F has the F distribution with I — 1 and N — I degrees of freedom. 


MSG = 


MSE = 
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The denominators in the formulas for MSG and MSE are the two degrees 
of freedom I — 1 and N — I of the F test. The numerators are called sums of — sums of squares 
squares, from their algebraic form. It is usual to present the results of ANOVA in 
an ANOVA table. Output from software usually includes an ANOVA table. ANOVA table 


Look again at the three outputs in Figure 27.4. The three software outputs give the 
ANOVA table. The calculator, with its small screen, gives the degrees of freedom, 
sums of squares, and mean squares separately. Each output uses slightly different 
language to identify the two sources of variation. The basic ANOVA table is 


Source of Variation df Ss MS F Statistic 
Variation among samples 2 1082.87 MSG = 541.44 259.12 
Variation within samples 5il 106.57 MSE = 2.09 


You can check that each mean square MS is the corresponding sum of squares SS 
divided by its degrees of freedom df. The F statistic is MSG divided by MSE. m= 


Because MSE is an average of the individual sample variances, it is also called 
the pooled sample variance, written as s;. When all I populations have the same 
population variance a’, as ANOVA assumes that they do, sj estimates the com- 
mon variance o”. The square root of MSE is the pooled standard deviation, s,. pooled standard deviation 
It estimates the common standard deviation o of observations in each group. The 
Minitab and calculator outputs in Figure 27.4 give the value s, = 1.446. 
The pooled standard deviation s, is a better estimator of the common @ than any 
individual sample standard deviation s; because it combines (pools) the information 
in all I samples. We can get a confidence interval for any one of the means py; from 
the usual form 


estimate + fF OE duns 


using s, to estimate a. The confidence interval for 1; is 


S 
= P 
x, + t*—= 


Vai 
Use the critical value t* from the t distribution with N — I degrees of freedom 


because s, has N — I degrees of freedom. These are the confidence intervals that 
appear in Minitab ANOVA output. 


We can do the ANOVA test comparing the mean lengths of H. bihai, H. caribaea red, 
and H. caribaea yellow flower varieties using only the sample sizes, sample means, 
and sample standard deviations. These appear in Example 27.1, but it is easy to find 
them with a calculator. There are / = 3 groups with a total of N = 54 flowers. 

The overall mean of the 54 lengths in Table 27.1 is 


4X + NeX2 + 3X3 


N 

_ (16)(47.598) + (23)(39.711) + (15)(36.180) 
54 

= 2217.621 


= 41.067 
54 
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The mean square for groups is 


ni(X, — X)* + nz(X. — x)? + 3(X3 — x)? 


MSG = 
1-1 
; 
= 57 [(16)(47.598 — 41.067) + (23)(39.711 — 41.067 
+ (15)(36.180 — 41.067)?] 
_ 1082.996 


= 541.50 
2 


The mean square for error is 


(n; — 1)st + (nz — 1)s3 + (ng — 1)s3 


MSE = 
: N-I 
(15)(1.213°) + (22)(1.7997) + (14)(0.975°) 
7 51 
106.580 
= ——— = 2.09 
51 
Finally, the ANOVA test statistic is 
MSG 541.50 
P=. = = 259, 
MSE 2.09 auc 


Our work differs slightly from the output in Figure 27.4 because of roundoff error. 
We don't recommend doing these calculations, because tedium and roundoff errors 
cause frequent mistakes. m 


(3) LaunchPad Online Resources 


* The StatBoards video, Understanding the ANOVA Table, reviews the details 
of the ANOVA table. 


Apply Your Knowledge 


The calculations of ANOVA use only the sample sizes n;, the sample means x,, and the 
sample standard deviations s;. You can therefore re-create the ANOVA calculations when 
a report gives these summaries but does not give the actual data. These optional exercises 
ask you to do the ANOVA calculations starting with the summary statistics. P-values 
require either a table or software for the F distributions. 


27.12 Road Rage. Exercise 27.1 describes a study of road rage. Here are the 
means and standard deviations for a measure of “angry/threatening driving” 
for random samples of drivers in three age groups: 


Age Group n x s 

Less than 30 years 244 2.22) 3.11 
30 to 55 years 734. 1.330 2.21 
Over 55 years 364 0.66 1.60 


(a) The distributions of responses are somewhat right-skewed. ANOVA 
is nonetheless safe for these data. Why? 


(b) Check that the standard deviations satisfy the guideline for ANOVA 


inference. 
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(c) Calculate the overall mean response x, the mean squares MSG and 
MSE, and the ANOVA F statistic. 


(d) Which F distribution would you use to find the P-value of the ANOVA 
F test? Software gives P < 0.001. Write a brief conclusion based on 
the sample means and the ANOVA F test. 


27.13 Angry Women, Sad Men. Exercise 27.2 describes a study in which sub- 
jects conferred status on men and women who were displaying either anger 
or sadness. Subjects were randomly assigned to the four treatments, with 17 
subjects assigned to each treatment. The study report contains the follow- 
ing information about status measure conferred for each of the four groups: 


Treatment n x s 

Males expressing anger 17 6.47 = 2.25 
Females expressing anger 17) 3.75 1.77 
Males expressing sadness 17. 4.05 1.61 


Females expressing sadness 17 5.02 —-1.80 


(a) Do the standard deviations satisfy the rule of thumb for safe use of 
ANOVA? 


(b) Calculate the overall mean response x, the mean squares MSG and 
MSE, and the F statistic. 


(c) Which F distribution would you use to find the P-value of the ANOVA 
F test? Write a brief conclusion based on the sample means and the 
ANOVA F test. 


27.14 Attitudes toward Math. Do high school students from different racial/ 
ethnic groups have different attitudes toward mathematics? Measure the 
level of interest in mathematics on a five-point scale for a national random 
sample of students. Here are summaries for students who were taking math 
at the time of the survey:? 


Racial/Ethnic Group n x s 

African American 809 2.57 1.40 
White 1860 2.32 1.36 
Asian/Pacific Islander 654 2.63 1.32 
Hispanic 883 2.51 1.31 
Native American 207 25) 128 


—_ 
ie) 
oa 


The conditions for ANOVA are clearly satisfied. Explain why. 
Calculate the ANOVA table and the F statistic. 


Software gives P < 0.001. What explains the small P-value? Do you 
think the differences are large enough to be important? 


~ yD 
(oa m 
Se AS 


CHAPTER 27 SUMMARY 


Chapter Specifics 


m One-way analysis of variance (ANOVA) compares the means of several populations. 
The ANOVA F test tests the null hypothesis that all the populations have the same 
mean. If the F test shows significant differences, examine the data to see where the 
differences lie and whether they are large enough to be important. 
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mu The conditions for ANOVA state that we have an independent SRS from each popu- 
lation; that each population has a Normal distribution; and that all populations have 
the same standard deviation. 


m Inpractice, ANOVA inference is relatively robust when the populations are non-Normal, 
especially when the samples are large. Before doing the F test, check the observations in 
each sample for outliers or strong skewness. Also verify that the largest sample standard 
deviation is no more than twice as large as the smallest standard deviation. 


m When the null hypothesis is true, the ANOVA F statistic for comparing I means from 
a total of N observations in all samples combined has the F distribution with I — 1 
and N — I degrees of freedom. 


m ANOVA calculations are reported in an ANOVA table that gives sums of squares, 
mean squares, and degrees of freedom for variation among groups and for variation 
within groups. In practice, we use software to do the calculations. 


STATISTICS IN SUMMARY 


Here are the most important skills you should have acquired from reading this chapter. 
A. Recognition 


1. Recognize when testing the equality of several means is helpful in understanding 
data. 


2. Recognize that the statistical significance of differences among sample means 
depends on the sizes of the samples and on how much variation there is within 
the samples. 


3. Recognize when you can safely use ANOVA to compare means. Check the data 
production, the presence of outliers, and the sample standard deviations for the 
groups you want to compare. 


B. Interpreting ANOVA 
1. Explain what null hypothesis F tests in a specific setting. 
2. Locate the F statistic and its P-value on the output of analysis of variance software. 


3. Find the degrees of freedom for the F statistic from the number and sizes of the 
samples. 


4. If the test is significant, use graphs and descriptive statistics to see what differences 
among the means are most important. 


Link It 


Analysis of variance is a general statistical method for studying sources of variation in a 
response. In this chapter, we have studied one-way ANOVA, which is a specific statistical 
technique designed to test the null hypothesis of the equality of the means of several popu- 
lations. As such, it is an extension of the two sample t test of Chapter 21 which tested the 
null hypothesis of the equality of two population means. 

When we reject the null hypothesis of equality of two means with the two-sample t 
test, then we have evidence that the two means are different. Rejection of the null hypoth- 
esis with a one-way ANOVA is a more ambiguous conclusion—it is evidence of a differ- 
ence in the means of the populations, but the result does not tell us which differences be- 
tween the means are statistically significant. This is similar to the chi-square test in Chapter 
25, where rejection of the null hypothesis indicates that there is a relationship between the 
two categorical variables, but says nothing about the nature of that relationship. Typically, 
after rejection of the null hypothesis in a one-way ANOVA, we perform a more detailed 
follow-up analysis to decide which of the means are different and to estimate how large these 
differences are. The companion Chapter 30, available online, presents some details of this 
follow-up inference. 
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() LaunchPad Online Resources 


If you are having difficulty with any of the sections of this chapter, these online 
resources should help prepare you to solve the exercises at the end of this chapter. 


* StatTutor starts with a video review of each section and asks a series of 
questions to check your understanding. 


» LearningCurve provides you with a series of questions about the chapter 
geared to your level of understanding. 
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CHAPTER 27 


One-Way Analysis of Variance: Comparing Several Means 


| CHECK YOUR SKILLS 


27.15 


27.16 


27.17 


27.18 


The purpose of analysis of variance is to compare 


(a) the variances of several populations. 
(b) the proportions of successes in several populations. 
(c) the means of several populations. 


A study examined the effects of Vitamin E and Me- 
mantine on the functional decline of patients in the 
early stages of Alzheimer’s disease. The investigators 
took 450 patients and randomly divided them into 
three groups, each containing 150 patients. One group 
was assigned to Vitamin E, one group to Memantine, 
and the last group to a placebo. The primary response 
was their score on the Alzheimer’s Disease Cooperative 
Study/Activities of Daily Living (ADCS-ADL) Inven- 
tory, which is designed to assess a patient’s functional 
ability to perform a range of daily living activities, with 
lower scores indicating worse function. The degrees of 
freedom for the ANOVA F statistic comparing the 
mean ADCS-ADL inventory scores are 


(a) 2and 147. (b) 2 and 447. (c) 3 and 147. 


The alternative hypothesis for the ANOVA F test in 
Exercise 27.16 is 


(a) either the Vitamin E or the Memantine treat- 
ment has higher mean ADCS-ADL inventory 
scores than the placebo treatment. 

(b) both the Vitamin E and Memantine treatments 
have higher mean ADCS-ADL inventory scores 
than the placebo treatment. 

(c) the mean ADCS-AD inventory scores for the 
three groups are not all the same. 


The F distributions are 


(a) a family of distributions with bell-shaped density 
curves centered at 0. 

(b) a family of distributions that are right-skewed 
and take only values greater than 0. 

(c) a family of distributions that are left-skewed and 
take values between 0 and 1. 


“Durable press” cotton fabrics are treated to improve their recovery 
from wrinkles after washing. Unfortunately, the treatment also re- 
duces the strength of the fabric. This study compared the breaking 
strength of fabrics treated by three commercial durable press pro- 
cesses. Five specimens of the same fabric were assigned at random 
to each group. Here are the data, in pounds of pull needed to tear 


the fabric: & 


| WEAKFAB 


Permafresh 55 29.9 30.7 30.0 29.5 27.6 
Permafresh 48 24.8 24.6 27.3 28.1 30.3 
Hylite LF Mets, — 723)89) TO). A 


Here is a partial Minitab output including the ANOVA table (several 
numbers have been omitted), along with the means and standard de- 
viations of the breaking strengths for the three durable press processes: 


Source DF Ss MS PB 
Process 2 47.50 0.026 
Error 12 56.74 

Total 14 104.24 

Level N Mean StDev 

Perm55 5 MSY PAO) 2.67 

Perm48 5 2102 2339 

Hylite 5 29.54 ALY 


Exercises 27.19 to 27.22 are based on this study. 


27.19 


27.20 


27.21 


27.22 


27.23 


The conclusion of the ANOVA test is that 


(a) there is strong evidence (P = 0.026) that the 
mean breaking strength is not the same for all three 
treatments. 

(b) there is strong evidence (P = 0.026) that the 
mean breaking strength is higher for the Hylite 
process than for the other two processes. 

(c) the data give no evidence (P = 0.026) to 
suggest that mean breaking strength differs among 
the three processes. 


The P-value in the output may not be accurate 
because 


(a) there are several extreme outliers in the data. 
(b) three treatments are too few for ANOVA. 

(c) the data show some evidence of a violation of 
the assumption that the three populations have the 
same standard deviation. 


To compare the treatments, we might use three 90% 
two-sample t confidence intervals to compare each 
pair of treatments: Perm55 versus Perm48, Perm55 
versus Hylite, and Perm48 versus Hylite. The weak- 
ness of doing this is that 


(a) we don’t know how confident we can be that all 
three intervals cover the true differences in means. 
(b) 90% confidence is okay for one comparison, but 
it isn’t high enough for six comparisons done at once. 
(c) we can’t compare two treatments that use 
different processes. 


(Optional topic) The value of the ANOVA F- statistic 
for testing equality of the population means of the 
three processes is 


(a) 0.84. (b) 5.02. (c) 10.04. 


A company runs a three-day workshop on strategies 
for working effectively in teams. On each day, a dif- 
ferent strategy is presented. Forty-eight employees 
of the company attend the workshop. At the outset, 
all 48 are divided into 12 teams of four. The teams 
remain the same for the entire workshop. Strategies 
are presented in the morning. In the afternoon, the 
teams are presented with a series of small tasks, and 
the number of these completed successfully using 
the strategy taught that morning is recorded for 
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each team. The mean number of tasks completed 
successfully by all teams each day and the standard 
deviation follow: 


DAY n x s 
i WE Gis) 7.10 
2 IW Ay SNA 
3 12) 17.21 = 14.03 


Exercises 27.24 to 27.27 describe situations in which we want to 
compare the mean responses in several populations. For each setting, 
identify the populations and the response variable. Then give I, n;, and 
N. Finally, give the degrees of freedom of the ANOVA F statistic. 


27.24 


27.25 


27.26 


Morning or evening? Are you a morning person, an 
evening person, or neither? Does this personality trait 
affect how well you perform? A sample of 100 stu- 
dents took a psychological test that found 16 morn- 
ing people, 30 evening people, and 54 who were nei- 
ther. All the students then took a test of their ability 
to memorize at 8 a.m. and again at 9 p.m. The re- 
sponse variable is the score at 8 a.m. minus the score 
at 9 P.M. 


Does art sell products? How does visual art affect 
the perception and evaluation of consumer prod- 
ucts? Subjects were asked to evaluate an advertise- 
ment for bathroom fittings that contained an art 
image, a nonart image, or no image. The art image 
was Vermeer’s painting Girl with a Pearl Earring, 
and the nonart image was a photograph of the ac- 
tress Scarlett Johansson in the same pose wearing 
the same garments as the girl in the painting and 
was taken from the motion picture Girl with a Pearl 
Earring. Thus the art and nonart image were a 
match on content. College students were divided at 
random into three groups of 39 each, with each 
group assigned to one of the three types of adver- 
tisements. Students evaluated the product in the 
advertisement on a scale of 1 to 7, with 1 being the 
most unfavorable rating and 7 being the most fa- 
vorable. The paper reported a one way ANOVA on 
the product evaluation index had F = 6.29 with 
IP = 0S 


Test accommodations. Many states require school- 
children to take regular statewide tests to assess their 
progress. Children with learning disabilities who read 
poorly may not do well on mathematics tests because 
they can’t read the problems. Most states allow “ac- 
commodations” for learning-disabled children. Ran- 
domly assign 100 learning-disabled children in equal 
numbers to three types of accommodation and a con- 


27.27 


27.28 
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In this example, we notice 


(a) the data show very strong evidence of a 
violation of the assumption that the three popula- 
tions have the same standard deviation. 

(b) ANOVA cannot be used on these data because 
the sample sizes are less than 20. 

(c) the assumption that the data are independent 
for the three days is unreasonable because the same 
teams were observed each day. 


trol group: math problems are read by a teacher, math 
problems are read by a computer, math problems are 
read by a computer that also shows a video, and stan- 
dard test conditions. The researcher would like to 
compare the mean scores on the state mathematics 
assessment. 


Exercise and type 2 diabetes. It is generally accept- 
ed that regular exercise provides health benefits to 
individuals with type 2 diabetes, although the exact 
exercise regimen (aerobic vs. resistance vs. both) is 
unclear. The subjects in this study were sedentary 
30- to 75-year-old adults with type 2 diabetes and 
elevated hemoglobin Alc levels above 6.5%. The 
level of hemoglobin Alc correlates very well with a 
person’s recent overall blood sugar levels. If the 
blood sugars have generally been running high dur- 
ing the previous few months, the level of hemoglobin 
Alc will be high. In a randomized controlled study, 
41 subjects were assigned to a nonexercise control 
group, 73 to resistance training only, 72 to aerobic 
exercise only, and 76 to combined aerobic and resis- 
tance training. The weekly duration of exercise was 
similar for all three exercise groups, and subjects re- 
mained on the exercise regimens for nine months. 
At the end of nine months, the hemoglobin Alc 
levels of subjects were measured.'” 


Don’t handle the merchandise? Although consum- 
ers often want to touch products before purchasing 
them, they generally prefer that others have not 
touched products they would like to buy. Can an- 
other person touching a product create a positive 
reaction? Subjects were given instructions to con- 
tact a sales associate at a university bookstore who 
would provide them with a shirt to try on. When 
meeting the sales associate, subjects were told there 
was only one shirt left and it was being tried on by 
another “customer.” The other customer trying on 
the shirt was a confederate of the experimenter and 
was either an attractive, well-dressed professional 
female model or an average-looking female college 
student wearing jeans and a tee shirt. Subjects, who 
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were either males or females, saw the confederate 
leaving the dressing room where the shirt was left 
for them to try on. There was also a control group of 
subjects who were handed the shirt directly off the 
rack by the sales associate. Thus there were five 
treatments: male subjects seeing a model, female 
subjects seeing a model, male subjects seeing a col- 
lege student, female subjects seeing a college stu- 
dent, and the control group. Subjects evaluated the 
product on five dimensions, each dimension on a 
seven-point scale, with the five scores then aver- 
aged to give the subject’s evaluation measure, with 
higher numbers indicating a more positive evalua- 
tion. Here are the sample sizes, means, and standard 
deviations for the five groups: 


TREATMENT GROUP n x s 

Males seeing a model 22 5.34 0.87 
Males seeing a student M3 33M) il 
Females seeing a model 24 410 1.32 
Females seeing a student 23 3.50 1.43 
Controls 27. «4.17 —-1.50 


(a) Verify that the sample standard deviations allow 
the use of ANOVA to compare the population 
means. What do the means suggest about the effect 
of the subject’s sex and the attractiveness of the 
confederate on the evaluation of the product? 

(b) The paper reports the ANOVA F = 8.30. 
What are the degrees of freedom for the ANOVA 
F statistic and the P-value? State your conclusions. 


Plants defend themselves. When some plants are 
attacked by leaf-eating insects, they release chemical 
compounds that attract other insects that prey on 
the leaf-eaters. A study carried out on plants growing 
naturally in the Utah desert demonstrated both the 
release of the compounds and that they not only re- 
pel the leaf-eaters but also attract predators that act 
as the plants’ bodyguards.'* The investigators chose 
eight plants attacked by each of three leaf-eaters and 
eight more that were undamaged, 32 plants of the 
same species in all. They then measured emissions of 
several compounds during seven hours. Here are 
data (mean + standard error of the mean for eight 
plants) for one compound. The emission rate is mea- 
sured in nanograms (ng) per hour. 


EMISSION RATE 


GROUP (ng/h) 

Control 7) a= Bois} 
Hornworm 31.03 + 8.75 
Leaf bug 18.97 + 6.64 
Flea beetle 27.12 + 8.62 


27.30 


27.31 
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(a) Make a graph that compares the mean emission 
rates for the four groups. Does it appear that 
emissions increase when the plant is attacked? 

(b) What hypotheses does ANOVA test in this setting? 
(c) We do not have the full data. What would you 
look for in deciding whether you can safely use 
ANOVA? 

(d) What is the relationship between the standard 
error of the mean (SEM) and the standard devia- 
tion for a sample? What are the four sample 
standard deviations? Do they satisfy our rule of 
thumb for safe use of ANOVA? 


Can you _ hear 
these words? To 
test whether a 
hearing aid is 
right for a patient, 
audiologists play 
a tape on which 
words are pro- 
nounced at low 
volume. The patient tries to repeat the words. There 
are several different lists of words that are supposed 
to be equally difficult. Are the lists equally difficult 
when there is background noise? To find out, an ex- 
perimenter had subjects with normal hearing listen 
to four lists with a noisy background. The response 
variable was the percent of the 50 words in a list that 
the subject repeated correctly. The data set contains 
96 responses.’ Here are two study designs that could 
produce these data: 


Phanie/Science Source 


Design A. The experimenter assigns 96 sub- 
jects to four groups at random. Each group of 
24 subjects listens to one of the lists. All indi- 
viduals listen and respond separately. 


Design B. The experimenter has 24 subjects. 
Each subject listens to all four lists in random 
order. All individuals listen and respond 
separately. 


Does Design A allow use of one-way ANOVA to 
compare the lists? Does Design B allow use of one- 
way ANOVA to compare the lists? Briefly explain 
your answers. 


More rain for California? The changing climate will 
probably bring more rain to California, but we don’t 
know whether the additional rain will come during 
the winter wet season or extend into the long dry 
season in spring and summer. Kenwyn Suttle of the 
University of California at Berkeley and his co-workers 
randomly assigned plots of open grassland to three 
treatments: added water equal to 20% of annual 
rainfall either during January to March (winter) or 
during April to June (spring), and no added water 


FIGURE 27.11 

Minitab ANOVA output for comparing 
the total plant biomass of grassland 
plots under different water conditions, 
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One-way ANOVA: biomass versus treatment 


for Exercise 27.31. pours a 22 i 3 : 
Treatment 2 97583; 48792 27.52 0.000 
Error a) 26593 EUS 
Total 17 124176 
S = 42.11 R-Sq = 78.58% R-Sq(adj) = 75.73% 
Individual 95% CIs For Mean Based on 
Pooled StDev 
Level N Mean StDev ----~~ Sp OSE SSS Se he Se te 
Controle <6: 8136.65: » 21698 Soa laa) 
Spring 6. S15..39) Ses (----*----) 
Winter 6 205s? ica ay (----*----) 
ee a a ———— 
140 210 280 350 
Pooled StDev = 42.11 
(control). Here are some of the data, for plant bio- (b) State Hy and H, for the ANOVA F test, and 
mass (in grams per square meter) produced by each explain in words what ANOVA tests in this setting. 
plot in a single year.'° £jj,) Mass2003 (c) Report your overall conclusions about the effect 
of added water on plant growth in California. 
WINTER SPRING = CONTROL 27.32 Can you hear these words? Figure 27.12 displays 
264.1514 318.4182 129.0538 the Minitab output for one-way ANOVA applied to 
187.7312 281.6830 144.6578 the hearing data described in Design A in Exercise 
291.1431 288.8433 172.7772 cee bares aes rs i: ae as he 
176.2879 382.6673. 113.2813 ic enti ies the four lists of wor s. Based on this analy- 
sis, is there good reason to think that the four lists 
141.7525 326.8877 142.1562 are not all equally difficult? Write a brief summary of 
169.9737 293.8502 117.9808 the study findings. 
27.33 College students need better sleep! A random sam- 


Figure 27.11 shows Minitab ANOVA output for 
these data. 


(a) Make side-by-side stemplots of plant biomass for 
the three treatments, as well as a table of the 
sample means and standard deviations. What do 
the data appear to show about the effect of extra 
water in winter and in spring on biomass? Do these 


data satisfy the conditions for ANOVA? 


€@ Session 


ple of 898 students between the ages of 20 and 24 at 
a large midwestern university completed a survey in- 
cluding questions about their sleep quality, moods, 
academic performance, physical health, and psycho- 
active drug use. Sleep quality was measured using 
the Pittsburgh Sleep Quality Index (PSQI), with stu- 
dents scoring less than or equal to 5 on the index 
classified as optimal sleepers, those scoring a 6 or 7 


Analysis of Variance for Percent 

MS 
306.8 
62.4 


DF 

3 
92 
95 


Source 
List 
Error 
Total 


Ss 
920.5 
5738.2 
6658.6 


FIGURE 27.12 

Minitab ANOVA output for comparing 
the percents heard correctly in four lists 
of words, for Exercise 27.32. 


Pooled StDev 7.898 


Individual 95% CIs For Mean 
Based on Pooled StDev 
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classified as borderline, and those scoring over 7 
classified as poor sleepers. The depression subscale of 
the Profile of Moods State (POMS) was used to as- 
sess how severely students experienced depression 
on a typical day, with high scores indicating greater 


One-Way Analysis of Variance: Comparing Several Means 


LAVENDER ODOR 


levels of depression. We want to know if there is a ZnS ee ae eee 
significant difference in depression scores among the 124 105 129 =103 107_=—«2109S 94 = 105_—s102'_—=*d108 
three classifications of sleep.!” The full data setistoo 95 121 109 104 116 881099701106 
large to print here, but here are the first seven indi- LEMON ODOR 
viduals: Fj)) DEPRESED 78 104 74 75 112 88 105 97 101 89 
88 73 94 63 83 108 91 88 83 106 
Quality of 108 60 96 94 56 90 113 97 
sleep: poor poor border poor poor optimal border NO ODOR 
Peon ae fs (Cn nn ne 
= 85 69 73 87 109 115 91 84 76 % 
(a) Follow the four-step process in data analysis and a NS eee 
ANOVA. Be sure to check the conditions for 27.35 Good weather and tipping. Favorable weather has 


27.34 


ANOVA and to include an appropriate graph that 
compares the depression scores for the three 
qualities of sleep. 

(b) Explain why we can trust the ANOVA F test to 
give valid results for these data. 

(c) This is an observational study. Explain why. In 
this study, we can see why association does not 
prove causation. Explain how poor-quality sleep 
might lead to higher levels of depression. Then 
explain how higher depression scores might affect 
quality of sleep. You see that a cause-and-effect 
relationship might go in either direction. 


Do good smells bring good business? Businesses 
know that customers often respond to background 
music. Do they also respond to odors? Nicolas 
Guéguen and his colleagues studied this question 
in a small pizza restaurant in France on Saturday 
evenings in May. On one evening, a relaxing lav- 
ender odor was spread through the restaurant; on 
another evening, a stimulating lemon odor; a third 
evening served as a control, with no odor. The 
three evenings were comparable in many ways 
(weather, customer count, and so on), so we are 
willing to regard the data as independent SRSs 
from spring Saturday evenings at this restaurant. 
Table 27.4 contains data on how long (in minutes) 
customers stayed in the restaurant on each of the 
three evenings.'® Ej) opors 


(a) Make an appropriate graph comparing the 
customer times for each evening. Do any of the 
distributions show outliers, strong skewness, or 
other clear deviations from Normality? 

(b) Do a complete analysis to see whether the 
groups differ in the average amount of time spent in 
the restaurant. Follow the four-step process in your 
work. Did you find anything surprising? 


been shown to be associated with increased tipping. 
Will just the belief that future weather will be favor- 
able lead to higher tips? The researchers gave 60 in- 
dex cards to a waitress at an Italian restaurant in New 
Jersey. Before delivering the bill to each customer, 
the waitress randomly selected a card and wrote on 
the bill the same message that was printed on the in- 
dex card. Twenty of the cards had the message “The 
weather is supposed to be really good tomorrow. | 
hope you enjoy the day!” Another 20 cards contained 
the message, “The weather is supposed to be not so 
good tomorrow. I hope you enjoy the day anyway!” 
The remaining 20 cards were blank, indicating that 
the waitress was not supposed to write any message. 
Choosing a card at random ensured that there was a 
random assignment of the diners to the three experi- 
mental conditions. Here are the percentage tips for 


the three messages:'? Ej) TIPPING 

Good weather 20.8 18.7 20.6 22.0 23.4 22.8 

report 24.9 22.2 24.9 22.3 27.0 20.4 
22.2 24.0 eM JES) ULSI 

Bad weather 18.0 19.0 18.8 184 19.0 18.5 

report 16.1 16.8 ALO) Sho} es) I) 
20.2 18.8 23.2 18.2 19.4 

No weather 19.9 16.0 20.1 19.3 19.2 18.0 

report 19.2 21.2 18.5 19.3 19.3 19.4 
10.8 19.1 Wey ILSs UMS 


Do the data support the hypothesis that there are dif- 
ferences among the tipping percentages for the three 
experimental conditions? Does a prediction of good 
weather seem to increase the tip percentage? Follow 
the four-step process in data analysis and ANOVA. 
Be sure to check the conditions for ANOVA and to 
include an appropriate graph which compares the tip- 
ping percentages for the three conditions. 


27.36 Bring your pen to class! The use of laptops for tak- 
ing notes is becoming more common despite the fact 
that there is some evidence that it is less effective 
than longhand notes for learning. In this study, 151 
subjects were randomly assigned to three groups and 
each group was asked to take notes in a different 
manner during a lecture lasting approximately 15 
minutes. Forty-eight subjects were randomly assigned 
to take notes using longhand in the usual way they 
would take notes in a class when they expected to be 
tested on the material, and fifty-one subjects received 
the same instructions but were told to take their 
notes on a laptop. The third “laptop-intervention” 
group contained fifty-two students. This group 
used a laptop but were told that those using a laptop 
often transcribed what they were hearing rather than 
thinking about it, and they were asked to try and 
write the notes in their own words. The study found 
little difference in factual learning, but those using 
longhand had greater conceptual learning than 
either of the other groups. Here we focus on the 
content of what they wrote, specifically the number 
of words.”° £93) wRDCOUNT 


(a) Begin by comparing the distribution of word 
count for the three treatment groups using both 
graphical and numerical measures. 

(b) Are the differences in the mean word counts for 
the three groups statistically significant? Carry out 
an appropriate analysis, making sure to check 
conditions, find the P-value, and state your conclu- 
sions in context. 


27.37 Logging in the rain forest: Species counts. Table 27.2 
gives data on the number of trees per forest plot, 
the number of species per plot, and species richness. 
Exercise 27.3 analyzed the effect of logging on num- 
ber of trees. Exercise 27.8 concludes that it would be 
risky to use ANOVA to analyze richness. Use soft- 
ware to analyze the effect of logging on the number of 


species. 


(a) Make a table of the group means and standard 
deviations. Do the standard deviations satisfy our 
rule of thumb for safe use of ANOVA? What do the 
means suggest about the effect of logging on the 
number of species? 

(b) Carry out the ANOVA. Report the F statistic 
and its P-value and state your conclusion. 


27.38 Inyour own words, please. The data in Exercise 27.36 
show that those using longhand write fewer words than 
those in either of the laptop groups. Do those using lap- 
tops have greater verbatim overlap with the lecture? 
For each set of notes, they compared three-word 
chunks of text in the notes with three-word chunks of 
text in the lecture transcription and reported a percent- 
age of matches, with a higher percentage suggesting 
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that the note taker had a greater tendency to transcribe 
the lecture verbatim rather than writing the notes in 
their own words. Give a complete analysis to compare 
the means of the percentage of matches for the three 
groups. £ij),| VERBATIM 


More rain for California? 
Exercise 27.31 describes a 
randomized experiment car- 
ried out by Kenwyn Suttle 
and his co-workers to exam- 
ine the effects of additional 
water on California grass- 
land. The experimental units 
are 18 plots of grassland, as- 
signed at random among three pa 
treatments: added water in 
the winter wet season, added 
water in the spring dry season, 


Suttle, K. B., M. A. Thomsen, and M. E. Power. 
and no added water (control “Species Interactions Reverse Grassland Responses 


group) . Field experiments, un-_ to Changing Climate.” Science 315.5812 (2007): 


like laboratory experiments, °4~?: 


are exposed to variations in the natural environment. The experiment 
therefore continued over five years, 2001 to 2005. Table 27.5 
(page 674) gives data on the total plant biomass (grams per square 
meter) that grew on each plot during each year.”! The “Plot” column 
shows how the random assignment of 18 of the 36 available plots 
worked. Exercises 27.39 to 27.41 are based on this information. 


27.39 Plot the means. Starting from the data in Table 27.5, 
you can calculate the mean plant biomass for each 
treatment in each year as follows: jj.) MAsSMEAN 


YEAR 
2001 2002 2003 2004 2005 
Winter | 132.58 203.33 205.17 223.58 332.84 
Spring 257.69 388.85 315.39 299.54 289.66 
Control 81.67 180.31 136.65 201.07 257.37 


(a) Plot the means for each of the three treatments 
against year, connecting the yearly means for each 
treatment by lines to show the pattern over time. Use 
the same plot for all three treatments, with a different 
color for each treatment. From this plot, you can get 
an overall picture of the experiment’s results. 

(b) Across all five years, does more water in the wet 
season increase plant growth? What about more 
water in the dry season? Which seasonal addition of 
water has the larger effect? 

(c) One-way ANOVAs comparing the mean plant 
biomass separately in each year find significant 
differences in three years and no significant 
difference in two years. Based on your plot, in 
which three years do you think the treatment 
means differ significantly? 
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One-Way Analysis of Variance: Comparing Several Means 


YEAR 

TREATMENT PLOT 2001 2002 2003 2004 2005 

Winter 3 136.8358 228.0717 264.1514 254.6453 344,3933 
Winter 8 iL aileyy 189.9505 187.7312 233.8155 203.3908 
Winter 14 136.1536 209.0485 291.1431 253.4506 331.9724 
Winter 20 121.6323 189.6755 176.2879 228.5882 388.1056 
Winter Ai 124.1459 188.0090 141.7525 158.6675 382.8617 
Winter By 125.2986 PAM 169.9737 MDDS 346.3042 
Spring at 338.1301 422.7411 318.4182 517.6650 344.0489 
Spring 11 291.8597 339.8243 281.6830 342.2825 261.8016 
Spring 18 244.8727 398.7296 288.8433 270.5785 262.7238 
Spring 22 234.6599 400.6878 382.6673 212.5324 316.9683 
Spring 25 197.5830 326.9497 326.8877 213.9879 224.1109 
Spring 35 239.0122 444.1556 293.8502 240.1927 328.2783 
Control 6 73.4288 148.8907 129.0538 178.9988 237.6596 
Control 110.6306 182.6762 144.6578 205.5165 281.1442 
Control ly 95.3405 196.8303 172.7772 242.6795 Nae 
Control 24 83.0584 186.1953 113.2813 231.7639 258.3631 
Control 28 30.5886 154.0401 142.1562 134.9847 235.8320 
Control 33) 96.9709 BGA 117.9808 212.4862 217.5060 


(d) In 2005, there were unusual late rains during 
the spring. How does the effect of this natural 
rainfall show up in your plot? (You see that it would 
not be wise to do an experiment like this in just 
one year.) 


The results for 2001. Your work in Exercise 27.31 
shows that there were significant differences in mean 
plant biomass among the three treatments in 2003. 
Do a complete analysis of the data for 2001 and re- 
port your conclusions. £j),)) MASSALL 


Conditions for ANOVA. Examine the data for the 
year 2004. The conditions for ANOVA inference 
are not met. In what way do these data fail to meet 
the conditions? (It is not very surprising that, in five 
ANOVAs, one will fail to satisfy our quite conserva- 
tive conditions.) {fj)) MASSALL 


Does playing video games make better surgeons? 
In laparoscopic surgery, a video camera and several 
thin instruments are inserted into the patient’s ab- 
dominal cavity. The surgeon uses the image from the 
video camera positioned inside the patient’s body to 
perform the procedure by manipulating the instru- 
ments that have been inserted. The Top Gun Lapa- 
roscopic Skills and Suturing Program was developed 
to help surgeons develop the skill set necessary for 
laparoscopic surgery. Because of the similarity in 
many of the skills involved in video games and lapa- 
roscopic surgery, it was hypothesized that surgeons 
with greater prior video game experience might acquire 


the skills required in laparoscopic surgery more eas- 
ily. Thirty-three surgeons participated in the study 
and were classified into the three categories—never 
used, under three hours, and more than three 
hours—depending on the number of hours they 
played video games at the height of their video game 
use. They also performed Top Gun drills, and re- 
ceived a score based on the time to complete the 
drill and the number of errors made, with lower 
scores indicating better performance. Here are the 
Top Gun scores and video game categories for the 
33 participants.” Egy) TOPGUN 


Never 9379 8302 5489 5334 4605 4789 9185 7216 9930 
played 4828 5655 4623 7778 8837 5947 
Under 5540 6259 5163 6149 4398 3968 7367 4217 5716 


three hours 


Three or 
more hours 


27.43 


7288 4010 4859 4432 4845 5394 2703 5797 3758 


Give a complete analysis to compare the means of the 
Top Gun performance scores for the three groups. 
This is an observational study. Explain why and indi- 
cate how this affects the conclusions that can be 
drawn. 


Which test? Example 27.4 describes one of the ex- 
periments done by Kathleen Vohs and her co- 
workers to demonstrate that even being reminded 
of money makes people more self-sufficient and 
less involved with other people. Here are three 


more of these experiments. For each experiment, 
which statistical test from Chapters 20 to 27 would 
you use, and why? 

(a) Randomly assign student subjects to money and 
control groups. The control group unscrambles 
neutral phrases, and the money group unscrambles 
money- oriented phrases, as described in Example 27.4. 
Then ask the subjects to volunteer to help the 
experimenter by coding data sheets, about five 
minutes per sheet. Subjects said how many sheets 
they would volunteer to code. Participants in the 
money condition volunteered to help code fewer 
data sheets than did participants in the control 
condition. 

(b) Randomly assign student subjects to high- 
money, low-money, and control groups. After playing 
Monopoly for a short time, the high-money group is 
left with $4000 in Monopoly money, the low-money 
group with $200, and the control group with no 
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money. Each subject is asked to imagine a future 
with lots of money (high-money group), a little 
money (low-money group), or just their future plans 
(control group). Another student walks in and spills 
a box of 27 pencils. How many pencils does the 
subject pick up? Participants in the high-money 
condition gathered fewer pencils than subjects in 
the other two groups. 

(c) Randomly assign student subjects to three 
groups. All do paperwork while a computer on the 
desk shows a screensaver of currency floating 
underwater (Group 1), a screensaver of fish 
swimming underwater (Group 2), or a blank screen 
(Group 3). Each subject must now develop an 
advertisement and can choose whether to work 
alone or with a partner. Count how many in each 
group make each choice. Choosing to perform the 
task with a co-worker was reduced among money 
condition participants. 


27.44 


27.45 
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Confidence in the banking system. The General Social Survey (GSS) is a 
sociological survey used to collect data on demographic characteristics and at- 
titudes of residents of the United States. The survey is conducted face-to-face 
with an in-person interview by the National Opinion Research Center at the 
University of Chicago, of a randomly selected sample of adults (18+). SDA (Sur- 
vey Documentation and Analysis) is a set of programs that allows you to analyze 
survey data and includes the GSS survey as part of its archive. Go to the website 
sda.berkeley.edu/ and click on Archive. Unless there is a more recent file, open 
the 1972-2012 cumulative data file (without the quick tables option). 


(a) In the Analysis tab on the top of the page, click on Comparison of Means. 
You are to do an ANOVA that examines how the mean age of the respondent 
varies with the person’s confidence in the banking and financial systems. To do 
this, type in the dependent variable as “Age” and the row (treatment) variable 
as “Confinan.” For the selection filter, type in “Year (2012),” or the most recent 
year available. For weight, change it to “noweight.” Finally, in table options, the 
only boxes that should be checked are “Std dev,” “N,” and “ANOVA stats.” Make 
sure checks are removed from the other boxes. Now click on Run the Table. 


(b) How many respondents are included in the analysis? What are the three 
means and standard deviations? In the ANOVA table, explain how the degrees 
of freedom were obtained. What are the F- and P-values? Write a brief report 
explaining the relationship between the average respondent’s age and their con- 
fidence in the banking system. 


Confidence in the banking system, continued. This exercise is a continu- 
ation of Exercise 27.44. You are going to download the data set and repro- 
duce the analysis, as well as provide some additional plots. First open the 
1972-2012 cumulative data file following the instructions in Exercise 27.44. 


(a) In the Download tab on the top of the page, click on Customized Subset. 
For the data file, if you highlight the CSV bubble, an Excel spreadsheet will be 


676 


CHAPTER 27 


One-Way Analysis of Variance: Comparing Several Means 


downloaded. For the selection filter, again type in “year (2012)” or the year used 
in Exercise 27.44. In the box for entering the names of individual variables, enter 
“age” and “confinan.” Click Continue on the bottom of the page. In the new 
window, click on Create the Files and in the next window click on Data Files. You 
can now either open or save the data file to your computer. 


(b) Import the data into your statistical software package. You first need to 
“clean” the data a little because there are observations for which either the “age” 
or “confinan” variable are missing. For the “confinan” variable, any value other 
than a 1, 2, or 3 is a missing value code. Delete these observations. For the “age” 
variable, the missing value codes are 0, 98, and 99. Eliminate any observations 
with these values for “age.” You should now have the same number of observa- 
tions as in Exercise 27.44. 

(c) Draw comparative boxplots of the age distribution for the three values of 
“confinan.” Describe the shapes of the three distributions. What information can 
you obtain from the boxplots that was not included in the output for Exercise 27.44? 


(d) Reproduce the one-way ANOVA table using your software. Your results 
should agree with Exercise 27.44. 
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Gallup had a known chance of being among those se- 
lected, but this chance depended on characteristics such 
as gender, age, race, Hispanic ethnicity, education, region, 
adults in the household, and phone status (cell phone 
only/landline only/both, cell phone mostly, and having an 
unlisted landline number). Gallup used special weights to 
adjust for differences in the probability of being selected 
to obtain an estimate of the proportion of all adults who 
owned a gun at the time of the poll in the population of all 
US. adults. The actual estimate used by Gallup was close 
to 37% and uses the result from the sample to estimate 
what is true for the population. 

Note that pennies have rims that make spinning more sta- 
ble. The probability of a head in spinning a coin depends 
on the type of coin and also on the surface. See Exercise 
21.3 for an account of 56% of heads in spinning a Belgian 
l-euro coin. Chance News 11.02 at www.dartmouth. 
edu/~chance reports about 45% heads in more than 
20,000 spins of American pennies by Robin Lock’s stu- 
dents at Saint Lawrence University. 


. The percentages were found at the GMAT website, 


www.gmac.com/market-intelligence-and- 
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research/gmat-statistics/profile-of- 
gmat-candidates.aspx. 


. Data for 2011 from the website of Statistics Canada, www. 


statcan.gc.ca. 


. You can find a mathematical explanation of Benford’s law 


in Ted Hill, “The first-digit phenomenon,” American Scientist, 
86 (1996), pp. 358-363; and Ted Hill, “The difficulty of fak- 
ing data,” Chance, 12, No. 3 (1999), pp. 27-31. Applications in 
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jofa/joaiss.htm. 
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com/pol1/1588/Children-Violence.aspx. 
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takers in 2013. These statistics are available online at 
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www.gallup.com/poll/15370/Party- 
Affiliation.aspx. 

Data from 2009 found online at http: //nhts.ornl. 
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See Note 16 for Chapter 1. 

National population estimates for July 1, 2012, at the Census 
Bureau website www. census . gov. The table omits people 
who consider themselves to belong to more than one race. 
Based on data from the 2012 Statistical Abstract of the 
United States, Table 58, at www. census .gov. 

This exercise is based on a famous example discussed in 
Part I] of The Design of Experiments by Sir R. A. Fisher. 
The example involves a lady tasting tea. Fisher used this 
example to illustrate the logic of hypothesis testing. Pre- 
sumably, if milk is added to hot tea, the milk can curdle 
and produce a slightly sour taste. This is avoided by adding 
the milk first. 
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Data provided by Patricia Heithaus and the Department 
of Biology at Kenyon College. 

E J. G. M. Klaassen and J. R. Magnus, “How to reduce 
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four years at Wimbledon,” in S. J. Haake and A. O. Coe 
(eds.), Tennis Science and Technology, Blackwell, 2000, 
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R. S. Gupta et al., “The Prevalence, Severity, and Distri- 
bution of Childhood Food Allergy in the United States,” 
Journal of Pediatrics, 128 (2011), pp. e9-e17. 

From the National Institutes of Health’s National 
Digestive Diseases Information Clearinghouse, found 
at wrongdiagnosis.com. 


. The probabilities given are realistic, according to the 
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repositoryfiles/ca3012p10-72079.pdf. 
Pew Internet at www.pewinternet.org provides a 
full disposition of all sampled numbers in many of its re- 
ports. The response rates vary somewhat, but the percent 
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. From Statistics Canada, Smoking 2011, at www. 


statcan.gc.ca. 


. The survey question is reported in Susanah Fox and Maeve Dun- 


can, “Tracking for Health’ January, 2013 at pewinternet. 
org. In fact, the estimate in the survey was 60% saying “Yes.” 
Information obtained from the Centers for Disease Control 
website, at www. cdc. gov. 

Information obtained from the Planned Parenthood website, 
at www. plannedparenthood.org. 

Results for the full 2013 season, at www. pgatour.com. 
Sales data from www. hyundainews.com/us/en-us/ 
Corporate/SalesReleases/PressReleases. 
aspx. 

Corby K. Martin et al., “Weight loss and retention rates 
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the effect of corporate partnership,” International Journal of 
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EstimateForPi / for an online demonstration of this idea. 
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Strictly speaking, the formula o/Vn for the standard 
deviation of x assumes that we draw an SRS of size 
n from an infinite population. If the population has 
finite size N, this standard deviation is multiplied by 
V1 — (n — 1)/(N — 1). This “finite population correc- 
tion” approaches 1 as N increases. When the population 
is at least 20 times as large as the sample, the correction 
factor is between about 0.97 and 1. It is reasonable to use 
the simpler form o/ Vn in these settings. 

Earnings for all 98,095 households were downloaded using 
the Census Bureau’s Data Ferret software. The histograms 
in Figure 15.4 were produced from the downloaded data. 
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See Note 10 for Chapter 3. 

Found online at pages. stern.nyu.edu/adamodar/ 
New Home Page/datafile/histret.html. 
Sophisticates will note that for compounding over several 
years we want the geometric mean return, which was 9.38%. 
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Ho: w = 0 
Hy: w > 0 
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Ho: wp = 0 
Hy w <0 


and interpret a null hypothesis such as Hy: w = O as 
implying either Ho: w < 0 or Ho: w = O for one-sided 
alternatives. 

Sir R. A. Fisher, one of the greatest statisticians and sci- 
entists of the 20th century, argued that “the null hypothesis 
must be exact, that is free from vagueness or ambiguity . . .” 
(see his book The Design of Experiments, Section 8 of Part II). 
His reason was that the null hypothesis must determine an 
exact distribution from which probability calculations can 
be made. He would advocate the use of Hp: fp = O and we 
have followed Fisher’s formulation in this textbook. 
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TABLES a 


Table entry for z is the area under the 
standard Normal curve to the left of z. 


z 00 01 .02 03 .04 .05 .06 .07 .08 .09 
—3.4 0003 0003 .0003 0003 .0003 0003 0003 0003 0003 0002 
=3,3 0005 0005 .0005 0004 0004 0004 0004 .0004 0004 .0003 
=3.2 0007 0007 .0006 0006 .0006 .0006 .0006 .0005 .0005 0005 
=3.1 0010 0009 .0009 0009 0008 0008 0008 .0008 0007 0007 
—3.0 0013 0013 0013 0012 0012 0011 0011 0011 .0010 .0010 
=19) 0019 0018 .0018 0017 .0016 .0016 0015 0015 .0014 0014 
= 1s 0026 0025 .0024 0023 0023 0022 0021 0021 .0020 0019 
nil 0035 0034 .0033 0032 0031 0030 .0029 0028 0027 .0026 
=19 0047 0045 .0044 0043 0041 .0040 .0039 0038 0037 .0036 
is) 0062 0060 0059 0057 .0055 .0054 0052 0051 0049 .0048 
—2.4 0082 0080 .0078 0075 .0073 0071 .0069 0068 .0066 0064 
=2:3 0107 0104 0102 0099 0096 0094 0091 .0089 .0087 0084 
=2,2 0139 0136 0132 0129 0125 0122 0119 0116 0113 0110 
—2.1 0179 0174 0170 0166 0162 0158 0154 0150 0146 0143 
—2.0 0228 0222 0217 0212 0207 0202 0197 0192 0188 0183 
=1L9 0287 0281 0274 0268 0262 0256 0250 0244 .0239 0233 
= 1s) 0359 0351 hOB 44 0336 0329 0322 0314 0307 0301 0294 
=f 0446 0436 0427 0418 .0409 .0401 0392 .0384 0375 .0367 
=IL0 0548 0537 0526 0516 0505 .0495 .0485 0475 0465 .0455 
=Il5) 0668 0655 .0643 0630 .0618 0606 0594 0582 0571 0559 
—1.4 0808 0793 0778 0764 0749 0735 0721 0708 0694 .0681 
=L3 0968 0951 .0934 0918 0901 .0885 0869 0853 0838 0823 
=12 1151 1131 1112 1093 1075 1056 1038 1020 1003 0985 
-11 1357 1335 1314 1292 1271 1251 1230 1210 1190 1170 
—1.0 1587 1562 1539 1515 1492 .1469 1446 1423 1401 1379 
=0.9 1841 1814 1788 1762 1736 awael 1685 1660 1635 1611 
=O 2119 2090 2061 2033 2005 1977 1949 1922 1894 1867 
—OL7 2420 2389 ANS 2327 2296 2266 2236 2206 AMET 2148 
—0.6 2743 2709 .2676 2643 2611 2578 2546 2514 2483 2451 
=0)5 3085 3050 3015 2981 2946 AMA Oe 2843 .2810 2776 
—0.4 3446 3409 3372 3336 3300 3264 3228 3192 3156 3121 
=0:3 3821 3783 3745 3707 3669 3632 3594 3557 3520 3483 
—0.2 4207 4168 4129 4090 4052 4013 3974 3936 3897 3859 
=0.1 4602 4562 4522 4483 4443 4404 4364 4325 4286 4247 
—0.0 5000 4960 4920 4880 4840 4801 4761 A721 .4681 4641 


Table entry for z is the area under the 
standard Normal curve to the left of z. 
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z .00 01 .02 03 .04 .05 .06 .07 .08 .09 
0.0 5000 5040 .5080 5120 5160 5199 5239 5279 5319 5359 
0.1 5398 5438 5478 5517 5557 5596 5636 5675 5714 5753 
0.2 5793 5832 5871 5910 5948 5987 .6026 .6064 6103 6141 
0.3 6179 6217 6255 6293 6331 6368 .6406 6443 .6480 6517 
0.4 6554 6591 .6628 6664 6700 6736 6772 6808 6844 .6879 
OS 6915 6950 .6985 7019 7054 7088 7123 7157 7190 7224 
0.6 257) 7291 Oa 7357 7389 7422 (AA: 7486 Id LY 
0.7 1580 7611 7642 7673 7704 4 7164 20S 1823 1852 
0.8 7881 7910 Bo) 7967 7995 8023 8051 .8078 8106 8133 
0.9 8159 8186 8212 8238 8264 8289 s)15) .8340 8365 8389 
1.0 8413 8438 8461 8485 8508 8531 8554 8577 8599 8621 
L.1 8643 8665 .8686 8708 8729 8749 8770 8790 .8810 .8830 
1.2 .8849 8869 .8888 8907 8925 8944 8962 .8980 8997 9015 
1.3 9032 9049 .9066 9082 9099 9115 9131 9147 9162 9177 
1.4 9192 9207 9222 9236 9251 9265 9279 9292 9306 9319 
5) B32 9345 9357 9370 9382 19394, 9406 9418 (9429 9441 
1.6 9452 9463 ATA 9484 9495 IOS 9515 D525 M335 9545 
7 S)514) 9564 M573 9582 9591 LEY .9608 9616 9625 9633 
IES 9641 9649 9656 9664 9671 9678 .9686 9693 .9699 .9706 
iL) SA i) 9726 9732 9738 DUBE 9750 9756 9761 9767 
2.0 9772 9778 9783 9788 9793 9798 .9803 9808 9812 9817 
2.1 9821 9826 9830 9834 9838 9842 9846 9850 9854 9857 
2.2 9861 9864 9868 9871 9875 9878 9881 9884 9887 9890 
2.3 9893 9896 9898 9901 9904 9906 9909 9911 9913 9916 
2.4 9918 9920 9922 9925 9927 9929 9931 9932 9934 9936 
2.5) 9938 9940 Sill 9943 9945 9946 9948 9949 9951 May 
2.6 9953 2255) 9956 9957 9959 9960 9961 9962 9963 9964 
Dell 9965 9966 9967 9968 9969 9970 9971 ODT I TA! 
2.8 9974 9975 9976 9977 9977 9978 SS) 9979 9980 9981 
28 9981 9982 9982 9983 9984 9984 9985 9985 9986 9986 
3.0 9987 9987 9987 9988 9988 9989 9989 9989 9990 9990 
3.1 9990 9991 9991 9991 9992 9992 9992 9992 9993 9993 
3.2 9993 9993 9994 9994 9994 9994 9994 9995 9995 9995 
3.3 9995 9995 9995 9996 9996 9996 9996 9996 9996 9997 
3.4 9997 9997 9997 9997 9997 9997 9997 9997 9997 9998 


700° 
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LINE 

101 19223 95034 05756 28713 96409 12531 42544 82853 
102 73676 47150 99400 01927 27754 42648 82425 36290 
103 45467 71709 77558 00095 32863 29485 82226 90056 
104 52711 38889 93074 60227 40011 85848 48767 52573 
105 95592 94007 69971 91481 60779 53791 17297 59335 
106 68417 35013 5529 72765 85089 57067 50211 47487 
107 82739 57890 20807 47511 81676 55300 94383 14893 
108 60940 72024 17868 24943 61790 90656 87964 18883 
109 36009 19365 4s 39638 85453 46816 83485 41979 
110 38448 48789 18338 24697 39364 42006 76688 08708 
111 81486 69487 60513 09297 00412 71238 27649 39950 
112 59636 88804 04634 71197 19352 73089 84898 45785 
113 62568 70206 40325 03699 71080 22553 11486 11776 
114 45149 32992 75730 66280 03819 56202 02938 70915 
115 61041 77684 94322 24709 73698 14526 31893 32592 
116 14459 26056 S44 80371 65103 62253 50490 61181 
117 38167 98532 62183 70632 23417 26185 41448 75532 
118 73190 62588 04470 29669 84407 90785 65956 86382 
Ie 95857 07118 87664 92099 58806 66979 98624 84826 
120 35476 55972 39421 65850 04266 35435 43742 11937 
121 71487 09984 29077 14863 61683 47052 62224 51025 
122 13873 81598 95052 90908 73592 75186 87136 95761 
123 54580 81507 27102 56027 55892 33063 41842 81868 
124 71035 09001 43367 49497 72719 96758 27611 91596 
125 96746 12149 37823 71868 18442 35119 62103 39244 
126 96927 19931 36809 74192 77567 88741 48409 41903 
127 43909 aril 25330 64359 40085 16925 85117 36071 
128 15689 14227 06565 14374 13352 49367 81982 87209 
129 36759 58984 68288 TIM) 18638 54303 00795 08727 
130 69051 64817 87174 09517 84534 06489 87201 97245 
131 05007 16632 81194 14873 04197 85576 45195 96565 
132 68732 55259 84292 08796 43165 93739 31685 97150 
133 45740 41807 65561 33302 07051 93623 18132 09547 
134 27816 78416 18329 21337 35213 37741 04312 68508 
135 66925 55658 39100 78458 11206 19876 87151 31260 
136 08421 BAL IS3 CUSUT 28744 75592 08563 79140 92454 
137 53645 66812 61421 47836 12609 15373 98481 14592 
138 66831 68908 40772 21558 47781 33586 TY 06928 
139 55588 99404 70708 41098 43563 56934 48394 51719 
140 12975 13258 13048 45144 (ZN 81940 00360 02428 
141 96767 35964 23822 96012 94591 65194 50842 53372 
142 72829 50232 97892 63408 77919 44575 24870 04178 
143 88565 42628 17797 49376 61762 16953 88604 12724 
144 62964 88145 83083 69453 46109 59505 69680 00900 
145 19687 12633 57857 95806 09931 02150 43163 58636 
146 37609 59057 66967 83401 60705 02384 90597 93600 
147 US 86278 88737 74351 47500 84552 19909 67181 
148 00694 05977 19664 65441 20903 62371 22725 53340 
149 71546 05233 53946 68743 72460 27601 45403 88692 
150 07511 88915 41267 16853 84569 79367 32337 03316 
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Table entry for C is the critical value 
t* required for confidence level C. 
To approximate one- and two-sided 
P-values, compare the value of the t 
statistic with the critical values of t* 
that match the P-values given at the 
bottom of the table. 


Tailarea 1=C 
2 


CONFIDENCE LEVEL C 


DEGREES OF 
FREEDOM 50% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9% 


1 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 3183 636.6 
2 0.816 1.061 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.60 
3 0.765 0.978 1.250 1.638 2.353 3.182 3482 4.541 5.841 7.453 10.21 12.92 
4 0.741 0.941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7.173 8.610 
5 0.727 0.920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.893 «6.869 
6 0.718 0.906 1.134 1.440 1.943 2.447 2612 3.143 3.707 4.317 5.208 5.959 
7 0.711 0.896 1.119 1415 1895 2.365 2.517 2.998 3.499 4.029 4.785 5.408 
8 0.706 0.889 1.108 1.397 1860 2.306 2.449 2.896 3.355 3.833 4.501 5.041 
9 0.703 0.883 1.100 1.383 1.833 2.262 2398 2.821 3.250 3.690 4.297 4.781 
10 0.700 0.879 1.093 1.372 1.812 2.228 2359 2.764 3.169 3.581 4.144 4.587 
i 0.697 0.876 1.088 1.363 1.796 2.201 2328 2.718 3.106 3.497 4.025 4.437 
12 0.695 0.873 1.083 1.356 1.782 2.179 2303 2.681 3.055 3.428 3.930 4.318 
B 0.694 0.870 1.079 1.350 1.771 2.160 2.282 2.650 3.012 3.372 3.852 4.221 
14 0.692 0.868 1.076 1.345 1.761 2.145 2.264 2.624 2.977 3.326 3.787 4.140 
15 0.691 0.866 1.074 1.341 1.753 2.131 2.249 2.602 2.947 3.286 3.733 4.073 
16 0.690 0.865 1.071 1.337 1.746 2.120 2.235 2.583 2.921 3.252 3.686 4.015 
17 0.689 0.863 1.069 1.333 1.740 2.110 2.224 2.567 2.898 3.222 3.646 3.965 
18 0.688 0.862 1.067 1.330 1.734 2.101 2.214 2.552 2.878 3.197 3.611 3.922 
19 0.688 0.861 1.066 1.328 1.729 2.093 2.205 2.539 2.861 3.174 3.579 3.883 
20 0.687 0.860 1.064 1.325 1.725 2.086 2197 2.528 2.845 3.153 3.552 «3.850 
21 0.686 0.859 1.063 1.323 1.721 2.080 2189 2.518 2.831 3.135 3.527 3.819 
22 0.686 0.858 1.061 1.321 1.717 2.074 2183 2.508 2.819 3.119 3.505 3.792 
23 0.685 0.858 1.060 1.319 1.714 2.069 2177 2.500 2.807 3.104 3.485 3.768 
24 0.685 0.857 1.059 1.318 1.711 2.064 2172 2.492 2.797 3.091 3.467 3.745 
25 0.684 0.856 1.058 1.316 1.708 2.060 2167 2.485 2.787 3.078 3.450 ‘3.725 
26 0.684 0.856 1.058 1.315 1.706 2.056 2.162 2.479 2.779 3.067 3.435 3.707 
27 0.684 0.855 1.057 1.314 1.703 2.052 2158 2.473 2.771 3.057 3.421 3.690 
28 0.683 0.855 1.056 1.313 1.701 2.048 2.154 2.467 2.763 3.047 3.408 3.674 
29 0.683 0.854 1.055 1.311 1699 2.045 2150 2.462 2.756 3.038 3.396 3.659 
30 0.683 0.854 1.055 1.310 1697 2.042 2147 2.457 2.750 3.030 3.385 3.646 
40 0.681 0.851 1.050 1.303 1.684 2.021 2123 2.423 2.704 2.971 3.307 3.551 
50 0.679 0.849 1.047 1.299 1.676 2.009 2109 2.403 2.678 2.937 3.261 —-3.496 
60 0.679 0.848 1.045 1.296 1.671 2.000 2.099 2.390 2.660 2.915 3.232 3.460 
80 0.678 0.846 1.043 1.292 1.664 1.990 2.088 2.374 2.639 2.887 3.195 3.416 
100 0.677 0.845 1.042 1.290 1.660 1.984 2.081 2.364 2.626 2.871 3.174 3.390 
1000 0.675 0.842 1.037 1.282 1646 1.962 2.056 2.330 2.581 2.813 3.098 3.300 
a 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.091 3.291 
One-sidedP | .25 20 15 10 05 025 02 Ol 005 = -.0025..001_-~—S«.0005 


Two-sided P 50 40 30 .20 .10 05 04 02 O1 .005 002 001 
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Table entry for p is the critical value 
X* with probability p lying to its 


Probability p 


right. 
x 
P 
df .25 .20 15 -10 .05 .025 .02 .01 005 .0025 .001 £.0005 
1 1.32 1.64 2.07 2.71 3.84 5.02 5.41 6.63 7.88 9.14 10.83 12.12 
2 2.77 3.22 3.79 4.61 5.99 7.38 7.82 9.21 10.60 11.98 13.82 15.20 
3 4.11 4.64 5.32 6.25 7.81 9.35 984 11.34 12.84 14.32 16.27 = 17.73 
4 5.39 5.99 6.74 7.78 9.49 11.14 = 11.67 13.28 14.86 16.42 18.47 20.00 
5 6.63 7.29 8.12 9.24 11.07 12.83 1339 15.09 16.75 1839 20.51 22.11 
6 784 856 945 1064 12.59 14.45 15.03 1681 1855 20.25 22.46 24.10 
7 9.04 9.80 10.75 12.02 14.07 16.01 1662 1848 20.28 22.04 2432 26.02 
8 10.22 IGIEOE, 12.03 13.36 = 15.51 753  lshily 2009 Bi B07 Bolle Liss 


9 11.39 12.24 13.29 14.68 16.92 19.02 19.68 21.67 23.59 25.46 27.88 29.67 
10 W295) IDA Wa ID) eh OARS BILLION) BIA 


11 13.70 14.63 15.77 17.28 =19.68 21.92 22.62 24.72 26.76 28.73 31.26 = 33.14 
12 14.85 15.81 16.99 18.55 21.03 23.34 24.05 26.22 2830 30.32 32.91 34.82 
13 15.98 16.98 18.20 19.81 22.36 24.74 25.47 27.69 29.82 31.88 34.53 36.48 
14 17.12 18.15 19.41 21.06 23.68 26.12 2687 29.14 3132 33.43 36.12 38.11 
15 18.25 19.31 20.60 22.31 25.00 27.49 28.26 30.58 32.80 34.95 37.70 39.72 


16 19.37 20.47 21.79 23.54 2630 28.85 29.63 32.00 34.27 36.46 39.25 41.31 
17 DORI — Dildo = BAN DA BIS) SHO.) HILO) SL SI HS) HOLT) Ai she) 
18 21.60 22.76 24.16 25.99 28.87 31.53 32.35 34.81 37.16 39.42 4231 44.43 
19 22 ee 0) Oe OS O22, OD OO) OES ONO OO 0 COMMENT OOD MEETS | 
20 M353) PSO HO) Al NL SI SO i IOI) I SIL ATO 


21 24.93 26.17 27.66 29.62 32.67 35.48 36.34 38.93 41.40 43.78 46.80 49.01 
22 26.04 27.30 2882 30.81 33.92 36.78 37.66 40.29 42.80 45.20 48.27 50.51 
23 27.14 = 28.43 29.98 32.01 935.17 = 38.08 = 338.97 41.64 44.18 = 46.62 49.73 52.00 
24 28.24 9-29.55 31.13 33.20, 36.42 = 339.36 40.27 42.98 = 45.56 48.03 51.18 53.48 
25 29.34 30.68 32.28 34.38 37.65 40.65 41.57 4431 46.93 49.44 52.62 54.95 


26 HO049 BAD 3343} SSG SSIS) EO ADS So AS) — SONS) SO Ser Il 
Ril SS SL SS HOTA = HOLT NI)A IA, 4O)4IS ee SY SANS DUIS 
28 3 OB BSL PUL ML AK AAI AS SOD 3S SO = HD.) 
DY) Bodily IM OSS NO) aS) SL HO) AS) SY EO 3K) 73} 
30 S40 OAS 3789 402 43.77 48 4790 S089 S367 5033 59.00  O2IE 


40 45.62 47.27 49.24 51.81 55.76 59.34 60.44 63.69 66.77 69.70 73.40 76.09 
50 56.33 58.16 60.35 863.17 = 67.50 = 71.42) 72.61 76.15 79.49 82.66 = 86.66 ~— 89.56 
60 66.98 68.97 71.34 74.40 79.08 8330 8458 8838 91.95 95.34 99.61 102.7 
80 88.13 90.41 93.11 96.58 101.9 1066 108.1 112.3 116.3 120.1 124.8 = 128.3 
100 | 109.1 111.7 114.7 118.5 124.3 129.6 131.1 135.8 140.2 144.3 149.4 153.2 
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Table entry for p is the critical value 
r* of the correlation coefficient r 


with probability p lying to its right. Probability p 


UPPER TAIL PROBABILITY p 


n .20 .10 .05 .025 .02 01 .005 .0025 .001 .0005 
3 0.8090 0.9511 0.9877 0.9969 0.9980 0.9995 0.9999 1.0000 1.0000 1.0000 
4 0.6000 0.8000 0.9000 0.9500 0.9600 0.9800 0.9900 0.9950 0.9980 0.9990 
5 0.4919 0.6870 0.8054 0.8783 0.8953 0.9343 0.9587 0.9740 0.9859 0.9911 
6 
7 
8 


0.4257 0.6084 0.7293 0.8114 0.8319 0.8822 0.9172 0.9417 0.9633 0.9741 
0.3803 0.5509 0.6694 0.7545 0.7766 0.8329 0.8745 0.9056 0.9350 0.9509 


0.3468 = 0.5067 0.6215 0.7067 0.7295 0.7887 0.8343 0.8697 0.9049 0.9249 

9 0.3208 0.4716 0.5822 0.6664 0.6892 0.7498 0.7977 0.8359 0.8751 0.8983 
10 0.2998 0.4428 0.5494 0.6319 0.6546 0.7155 0.7646 0.8046 0.8467 0.8721 
11 0.2825 0.4187 0.5214 0.6021 0.6244 0.6851 0.7348 0.7759 0.8199 0.8470 
IZ 0.2678 0.3981 0.4973 0.5760 0.5980 0.6581 0.7079 0.7496 0.7950 0.8233 


13 0.2552 0.3802 0.4762 0.5529 0.5745 0.6339 0.6835 0.7255 0.7717 0.8010 
14 0.2443 0.3646 0.4575 0.5324 0.5536 0.6120 0.6614 0.7034 0.7501 0.7800 
15 0.2346 = 0.3507 0.4409 = 0.5140 0.5347 0.5923 0.6411 0.6831 0.7301 0.7604 
16 0.2260 0.3383 0.4259 = 0.4973 0.5177 0.5742 0.6226 0.6643 0.7114 0.7419 
17 0.2183 0.3271 0.4124 0.4821 0.5021 0.5577 0.6055 0.6470 0.6940 0.7247 


18 0.2113 0.3170 0.4000 0.4683 0.4878 0.5425 0.5897 0.6308 0.6777 0.7084 
19 0.2049 0.3077 0.3887 = 0.4555 0.4747 = 0.5285 0.5751 0.6158 0.6624 0.6932 
20 0.1991 0.2992 0.3783 0.4438 0.4626 0.5155 0.5614 0.6018 0.6481 0.6788 
Dil 0.1938 0.2914 0.3687 0.4329 0.4513 0.5034 0.5487 0.5886 0.6346 0.6652 
22 0.1888 0.2841 O35 OAT 0.4409 0.4921 0.5368 0.5763 0.6219 0.6524 


23 0.1843 0.2774 0.3515 0.4132 0.4311 0.4815 0.5256 0.5647 0.6099 0.6402 
24 0.1800 0.2711 0.3438 0.4044 0.4219 0.4716 0.5151 0.5537 0.5986 0.6287 
25 0.1760 0.2653 0.3365 0.3961 0.4133 0.4622 0.5052 0.5434 0.5879 0.6178 
26 0.1723 0.2598 0.3297 = 0.3882 0.4052 = 0.4534 0.4958 0.5336 0.5776 0.6074 
27 0.1688 0.2546 0.3233 0.3809 0.3976 0.4451 0.4869 0.5243 0.5679 0.5974 


28 0.1655 0.2497 0.3172 0.3739 0.3904 = 0.4372 0.4785 0.5154 0.5587 0.5880 
BY) 0.1624 0.2451 OSS 0.3673 0.3835 0.4297 0.4705 0.5070 0.5499 0.5790 
30 0.1594 = 0.2407 0.3061 0.3610 0.3770 0.4226 0.4629 0.4990 0.5415 0.5703 
40 0.1368 = 0.2070 0.2638 0.3120 0.3261 0.3665 0.4026 0.4353 0.4741 0.5007 
50 0.1217 0.1843 0.2353 0.2787 0.2915 0.3281 0.3610 0.3909 0.4267 0.4514 


60 0.1106 0.1678 0.2144 = 0.2542 0.2659 = 0.2997 0.3301 0.3578 0.3912 0.4143 
80 0.0954 0.1448 0.1852 0.2199 0.2301 0.2597 0.2864 0.3109 0.3405 0.3611 
100 0.0851 0.1292 0.1654 0.1966 0.2058 = 0.2324 0.2565 0.2786 0.3054 0.3242 
1000 0.0266 0.0406 0.0520 0.0620 0.0650 0.0736 0.0814 0.0887 0.0976 0.1039 


ANSWERS TO ODD-NUMBERED EXERCISES 


Chapter O Getting Started 


0.1: (a) More than likely the individuals who chose to take 
vitamin E were more health-conscious in general than those 
who didn’t. They may also have been more affluent (i-e., had 
money available to purchase the vitamins). (b) In a random- 
ized experiment, people of all types are randomly assigned to 
the treatments. 


0.3: (a) The proportion of Americans who feel this way is 
likely much lower. The “survey” used voluntary response (peo- 
ple made their own decision to participate or not). Also, those 
who knew about the “poll” had just watched Mr. Schultz’s 
long monologue. (b) As long as the poll was voluntary response, 
the sample size (868, 2500, or 100,000) doesn’t matter. 


Chapter 1 
with Graphs 


Picturing Distributions 


1.1: (a) The individuals are the car makes and models. (b) 
The variables are vehicle class (categorical), transmission type 
(categorical), number of cylinders (usually treated as quantita- 
tive), city mpg (quantitative), highway mpg (quantitative), 
and annual fuel cost (dollars, quantitative). 


1.3: (a) The given shares sum to 76.4%. 100% — 76.4% = 
23.6% listen to stations with other formats. (b) The bar graph 
is shown. 


Arbitron Radio Station Format Shares 
2-4 


20 + 


AS 


Se & © & & Ww & 
Ad oS geo Wy Wh ce 


SO a kg RY 
oe FSF SS 


Format 


(c) If you include a wedge for “Other format” that accounts for 
23.6% of the total, a pie chart is reasonable. 


1.5: A pie chart would make it more difficult to distinguish 
between the weekend days and the weekdays. Some births are 
scheduled (induced labor, for example), and probably are 
scheduled for weekdays. 


Births by Day of Week (2012) 


Sun Mon Tues Wed Thur Fri Sat 
Day 


1.7: Use the applet to answer these questions. 


1.9: (a) There are two clear peaks in the distribution. 
If we gave only one center, it would most likely not be truly 
representative. (b) Young boys might spend a lot of time 
outdoors playing and engaged in sports; their time in places 
where they would encounter ticks might well be less as they 
get older. With families and yard work, their time outside 
might increase. (c) Hiking in the woods at any age will 
make a person more likely to encounter ticks. (d) The 
histograms have the same shapes, but females have a slightly 
lower incidence rate at every age. Females possibly spend less 
time outdoors. 


1.11: A stemplot for health expenditure per capita (in PPP) 
is given. Stems are thousands, and are split. This distribution is 
right-skewed, with a single high outlier (United States). There 
seem to be two clusters of countries. The center of this distri- 
bution is around 20 ($2000 spent per capita), ignoring the out- 
lier. The distribution varies from 0| 1 ($100 spent per capita) to 
8|6 (about $8600 spent per capita). 
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1.13: 


1.15: (b) Square footage and average monthly gas bill are 
both quantitative variables. 


1.17: (b) 58% to 61% 


1.19: (b) 80%. There are 50 observations, so the center would be 
between the 25th and 26th observations; both of these are 80%. 


(a) the students 


1.21: (b) 92%. The stems are rounded to whole percents; 
you cannot make finer judgments. 


1.23: (a) Individuals are students who have finished medical 
school. (b) Five, in addition to “Name.” “Age” (in years) and 
“USMLE” (score points) are quantitative. The others are 
categorical. 

1.25: “Other colors” account for 3%. A bar graph would be 
an appropriate display. If you included the “other” category, a 
pie chart could be made. 


Most Popular North American Car Colors 2013 
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1.27: (a) A bar graph is given. (b) To make a pie chart, you 
would need to know the total number of deaths in this age 
group, or the number of deaths due to “other” causes. 


Causes of Death (2011, 15- to 24- year-olds) 
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1.29: (a) A bar graph is provided. (b) A pie chart would 
be inappropriate; these percentages aren't parts of a single 
“whole.” 


Canadian Universities Rated "Excellent" by Students 
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1.31: (a) Ignoring the four lower outliers, the distribution is 
roughly symmetric, centered at about 110, and ranging from 86 
to 136. (b) 64 of the 78 scores are more than 100. This is 82.1%. 


1.33: (1.) “Are you male or female” is Histogram (c). The 
difference in frequencies is likely to be smaller than the 
right-handed/left-handed. (2.) “Are you right-handed or left- 
handed?” is Histogram (b) because there are many more 
right-handed people than left handed people. (3.) “What is 
your height in inches?” is Histogram (d). Height distribution is 


likely to be symmetric. (4.) “How many minutes do you study 
on a typical weeknight?” is Histogram (a). Time spent studying 
may well be right-skewed, with most students spending less 
time studying, but some students studying a lot. 


1.35: (a) States vary in population. Nurses per 100,000 pro- 
vides a better measure of how many nurses are available to 
serve a state. (b) A histogram is provided. The District of 
Columbia, South Dakota, and Massachusetts are the three 


1.39: 
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The decline in population is not seen in the stemplot 


made in Exercise 1.37. 


nt of Pups Born (thousands) 
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St. Paul Island Fur Seal Pups 


states different from the others. Washington, D.C., and Mas- 180 4 
sachusetts are populous with large medical communities. It’s 160 + 
difficult to know why South Dakota would also have an unusu- Vel 
ally high number of nurses. 
120 4 
= 100-4 
Nurses per 100,000 People by State 8 
| 1980 1985 1990 1995 2000 2005 2010 2015 
Year 
10 + 
1.41: Coins with earlier (lower) dates are rarer. There are more 
3 3 coins with larger dates (newer coins) than with smaller dates. 
& 1.43: (a) Graph (a) appears to show the greatest increase. 
2 64 Vertical scaling can impact one’s perception of the data. (b) In 
§ 2000, tuition was a bit more than $4000 and rises to almost 
5 a $9000; this is an increase of almost $5000. Both plots describe 
< the same data. 
7 -| 1.45: (a) A time plot of ozone hole size (area) is provided. 
There was a trend until about 1995; after that, we see only 
ad year-to-year variability. The hole may have leveled out in 
400 600 800 1000 1200 1400 recent years; the overall trend since about 2006 might be a 


Nurses 


decrease in size. 


Ozone Hole Area 


1.37: The shape of the distribution is roughly symmetric we 
(maybe left skewed if we ignore the high outlier); 245 seems to s e. 
be a high outlier. The center is about 171. The data range from £ 
about 94 to about 245. B 204 
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18 | 22 capture the relationship between size of hole and year. 
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Chapter 2 Describing Distributions with 
Numbers 

24: x= 291.0 + ave ee 


+26 = 56.28 per milliliter. The mean is 
greater than most of the observations because of the two outliers. 


2.3: The mean travel time is x = 31.25 minutes. The median 
travel time is 22.5 minutes. The mean is significantly larger 
than the median due to the right skew in the distribution. 


2.5: A histogram is given. Note the right skew. The mean is 
larger than the median. x = 4.60566 and the median is 3.7034 


tons per person. 


2010 Per Capita CO> Emissions 
185 


164 


Number Countries 


0) 4 8 1 16 
COz 


2.7: (a) Minimum = 11, Q, = 18, Median = 21, Q; = 26, 
Maximum = 51. (b) The boxplot shows right skew in the 
distribution. There are several high outliers (which are most 
likely hybrid cars). Most software will identify the outliers 
automatically (if you do not want them identified, they are 
omitted from the graph). 


2014 Midsize MPG 


MPG 


2.9: IQR = 26 — 18 = 8,s0Q;4+ 1.5 X IQR = 26+ 1.5 X 
8 = 38. There are eight values greater than 38 that would be 
identified as potential outliers (40, 40, 40, 43, 45, 47, 50, and 
51). Because Q, — 1.55 X IQR = 18 — 1.5 X 8 = 6, there are 
no potential outliers on the low end. 


2.11: Both data sets have the same mean and standard devi- 
ation (about 7.5 and 2.0, respectively). Stemplots reveal that 
Data A has a very left-skewed distribution, while Data B has a 
slightly right-skewed distribution with a high outlier. 


A B 
1 3 
7 4 
5 | 257 
1 6|58 
2 7|079 
7711 &6|486 
211 9 
10 
11 
1215 


2.13: PLAN: Create side-by-side boxplots for the three types 
of plots and compute appropriate summary statistics. SOLVE: 
None of the distributions are symmetric; Group 2 (logged one 
year earlier) has a low outlier and Group 3 (logged eight years 
earlier) is clearly left-skewed, while Group 1 (never logged) 
appears right-skewed. Because of the non-symmetric shapes, 
compute the five-number summaries for each. 


Min Q, M Qs Max 


Group | (never logged) 160 «19.25 23 27.75 33 
Group 2 (logged 1 year earlier) 2 12 14.5 17.75 20 
Group 3 (logged 8 years earlier) 4 12 18 = 20.5 22 


Count of Trees in Borneo Rainforest Plots 
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(If you compute the means and standard deviations, they are: 
Group 1: x = 23.75, s = 5.07; Group 2: x = 14.08, s = 4.98; 
and Group 3: x = 15.78, s = 5.76.) CONCLUDE: It is clear 
from the boxplots and summary statistics that plots that have 
never been logged have more trees than either type of logged 
plot. Further, if we compare the distributions and summary 
statistics for the two different types of logged plots, it takes a 
long time for the rain forest to recover from having been 
logged. 


2.15: (b) 167.48 
2.17: (b) 151.6, 163.5, 168.25, 174.3, 177.6 


2.19: (a) 25%. Q; has 75% of observations equal to or less 
than its value. 


2.21: (c) 8.2 
2.23: (b) seconds 


2.25: The distribution of incomes in this group is almost cer- 
tainly right-skewed. The mean is $62,597 and the median is 
$50,281. 


2.27: With 849 colleges, the median location is (849 + 1)/2 = 
425, so the median is the 425th (ordered) endowment. The 
first quartile, Q,, is found by taking the median of the first 424 
endowments. This would be the (424 + 1)/2 = 212.5th 
endowment. Similarly, Q; is found as the 637.5th endowment 
(212.5 above the median). 


2.29: The boxplots do not reveal the gap in the 
South between the rates for Georgia and the District of 
Columbia. 


High School Graduates by Region 
SO) -| 
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2.31: A histogram of the survival times is given. The distri- 
bution is strongly right-skewed, with center around 100 days, 


709 


and range from about 0 to about 600 days. (b) Because of the 
extreme right skew, use the five-number summary: 43, 82.5, 
102.5, 151.5, 598 days. Notice that the median is closer to Q; 
than to Q3. 


Guinea Pig Survival After Infection 
30 5 
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2.33: (a) Symmetric distributions are best summarized using 
x and s. The distribution for the treatment group was right- 
skewed. The control distribution could be called symmetric, 
but it has a high outlier. (b) Removing the outliers reduces all 
three statistics. The mean decreased by 8.45 seconds, about 
double the decrease in the median (3.5 seconds). 


With outlier Without outlier 
Mean 59.7 51.25 
Standard deviation 63.0 50.97 
Median 61 57.5 


2.35: (a) The sixth observation must be placed at the 
median for the original five observations. (b) No matter 
where you put the seventh observation, the median is one of 
the two repeated values above, because it will be the fourth 
(ordered) observation. 


2.37: x = 8.83%, far from the national percentage of 12.5%. 
You can’t average averages. Some states, such as California 
and Florida, are larger and should carry more weight in the 
national percentage. 


2.39: Answers will vary, but a raise in the minimum 
wage will probably have a greater impact on the median 
income. Most Americans earn “middle income” or less; a 
few people earn huge amounts each year. The few large 
amounts will still pull the mean toward that end of the 
distribution. 
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2.41: Many answers are possible. Start by ensuring that the 
median is 12, by “locking” 12 as the fourth smallest value. 
We also have 4 specified as the minimum and 19 as the 
maximum, so the seven numbers must be 4, ___,__, 12, 
___,____, 19. With three numbers either side of the median, 
the quartiles will be in positions 2 and 6. 


2.43: (a) Weight losses that are negative are weight gains. 
(b) A side-by-side boxplot is provided. Gastric banding 
seems to produce higher weight losses. Because both distri- 
butions are somewhat right-skewed, the five-number sum- 
mary would be appropriate. The summary statistics are given. 
(c) It’s better to measure loss relative to initial weight. A loss 
of 5 kg would not mean the same if individuals started at 
different weights. Percent reduction in BMI would also be 
good. (d) If the subjects who dropped out had continued, 
the difference between these groups would be as great or 
greater because many of the “lifestyle” dropouts had nega- 
tive weight losses (i.e., weight gains), which would pull that 
group down. 


Obese Adolescents' Weight Losses 


75 4 . 
50 4 
2 
a 
8 25 4 
O44 
Banding Intervention 
Group 
Min Qa M Qs Max 
Banding —5.40 22.70 33.35 42.67 81.40 
Intervention —17.00 —4.38 1.70 12.57 34.60 


2.45: PLAN: Graph the returns with a histogram and a time 
plot. Based on those, compute and report appropriate sum- 
mary statistics. SOLVE: The histogram and summary statistics 
are given. CONCLUDE: The distribution of average returns is 
left-skewed. Most years, the average return is positive. Returns 
range from about —40% to 40%, with median return about 
16%. 


wilshire 5000 Returns 1972-2013 
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Return Percent 

Mean St.Dev. Minimum Q, Median Q, Maximum 
12.19 18.17 —37.23 0.98 16.06 26.77 37.38 


2.47: PLAN: Create side-by-side boxplots of the three distri- 
butions, and compute appropriate summary statistics. SOLVE: 
Side-by-side boxplots of tip results are given. Good weather 
forecasts generally yielded better tips than the other two, and 
a relatively symmetric distribution. The bad weather forecast 
had an outlier on each end, while the no-weather message had 
three low outliers. 


Percent Tips and Weather "Forecast" 
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Bad-weather | Good-weather  No-weather 
Mean St. Dev. Minimum Q, Median Q; Maximum 


Good weather 22.22 1.96 13.6 17.125 22.15 23.25 27.0 
Bad weather 18.18 210 13.6 17.125 18.45 19.15 Lak 
18.725 2.39 10.8 18.57 19.3 19.87 21.3 


No weather 


2.49: (a) Do side-by-side boxplots to compare. All three 
distributions are right-skewed with high outliers. The median 
increases slightly with increasing age (from 173 to 190 to 
204). We also see an increase in variability as people age. (b) 
25% or more of the individuals in each age group had total 
cholesterol levels above 200. Unless their original cholesterol 
levels were extremely high, the 4 or 24 people on medication in 
their 20s and 30s probably wouldn’t affect these distributions 
a great deal because there were roughly 950 people in each 
group. However, there were 1139 people in their 40s; 117 of 
those on medication is more than 10% of this group. If those 
117 had not been on medication, that distribution would 
likely show more variability and higher cholesterol readings. 


Total Cholesterol by Age Group 
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2.51: (a) Min = 1.3, Q,; = 4.1, Median = 6.2, Q; = 13.4, 
Max = 27.1. (b) High end outliers are larger than 13.4 + 
1.5 X (13.4 — 4.1) = 27.35. California (27.1% foreign-born) 


is not an outlier. 


2.53: Using Minitab, the five-number summary of choles- 
terol levels for people in their 20s is 92, 154, 173, 199, 318. We 
have IQR = 199 — 154 = 45. Outliers would be values smaller 
than 154 — 1.5 X 45 = 86.5 or larger than 199 + 1.5 X 45 = 
266.5. Using this criterion, there are no low outliers, but there 
are high end outliers. 


Chapter 3 The Normal Distributions 


3.1: Sketches will vary. (a) Symmetric distributions are mir- 
ror images on either side of the center. This distribution should 
have two humps (bimodal) (b) A distribution that is skewed to 
the left has a long left tail. 


3.3: uw = 2.5, the balance point. The median is also 2.5 
because the distribution is symmetric. 


3.5: A sketch of the Normal curve is given. The tick marks 
are placed at the mean, and at one, two, and three standard 
deviations above and below the mean. 


CHAPTER 3 @ The Normal Distributions | 711 


T T T T 
29.5 31.6 337 35.8 37.9 40.0 421 
Upper Arm Length (cm) 


3.7: (a) In 95% of all years, monsoon rain levels are 
between 852 + 2(82) = 688 and 1016 mm. (b) The driest 
2.5% of monsoon rainfalls are less than 688 mm (more than 
2a below yp). 


3.9: A woman 6 feet tall has xy = “5,44 = 2.79. A man 6 
feet tall has zy = 43,4 = 0.87. The woman is much taller 
relative to other women. 


3.11: Let x be the monsoon rainfall in a given year. (a) x S 
697 mm corresponds to z = @45*" = —1.89. Table A gives 
0.0294 = 2.94%. (b) 682 <x < 1022 corresponds to @ 53” 
<z < eS or -2.07 < zx < 2.07. From Table A, 0.9808 
— 0.0192 = 0.9616 = 96.16%. 


3.13: (a) Using Table A, we find this value has zy = 0.39 
(software gives z = 0.3853). (b) We want a proportion of 
0.80 below. Using Table A, x = 0.84 (software gives x = 
0.8416). 


Normal, Mean = O, StDev = 1 


0.3853 0.8416 


3.15: (c) Economic variables such as income and house 
prices are usually right-skewed. 


3.17: (b) The curve is centered at 2. 
3.19: (b) 266 + 2(16) = 234 to 298 days. 
3.24: (a) 7 = BH = 2.13 
3.23: (b) 0.1056 
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3.25: Sketches will vary, but should be some variation on the 
one shown here: the peak at 0 should be “tall and skinny,” 
while near 1, the curve should be “short and fat.” 


3.27: 70 is 2o below p (that is, has z = 
of adults have WAIS scores below 70. 


—2), so about 2.5% 


T T T T 
55 70 85 100 115 130 145 


3.29: (a) Looking up 0.3000 in Table A, zy = —0.52. 
(Software gives zy = —0.5244.) (b) If 35% are more than z, 65% 
are less than or equal to z; zy = 0.39. (Software gives 


z = 0.3853.) 


3.31: x <5.0 corresponds toz < *29357% = —0.80; Table A 
gives 0.2119. 


3.33: 0.8725 < x < 0.8775 corresponds to S43 —S8D0 < 
z< aa oP or —2.08 < x < 2.08. Fest, “Table A, 


0.9812 — 0.0188 = 0.9624. 


3.35: Cars with better mileage correspond to x > 28, zx > 
e722 = 1.12.1 — 0.8686 = 0.1314, or 13.14%. 


3.37: Q, and Q; have zy = —0.67 and z = 0.67, respectively. 
Q; = 22.2 — (0.67)(5.2) = 18.72 mpg and Q; = 22.2 + 
(0.67) (5.2) = 25.68 mpg. 


3.39: (a) Larry’s arm has zx = 44,54 = —0.83. Using 
Table A, his percentile is 0.2033 (about the 20th). (b) 
Answers will vary due to variation in students’ arm lengths. 


3.41: We want the proportion corresponding to x > 69.4 
inches. This corresponds to zx > @45% = 1.86. From 
Table A, 1 — 0.9686 = 0.0314, or 3. 1496, 


3.43: (a) For men, X > 750 corresponds to x > 
bo! = 1.81. 1 — 0.9649 = 0.0351. (b) For women, X > 
750 ‘abriespoad: to x > BS = 2.24. 1 — 0.9875 = 
0.0125. 


3.45: (a) About 0.6% of healthy young adults have 
osteoporosis (the cumulative probability below xy = —2.5). 
(b) The BMD level 2.50 below the young adult mean 
would be x = —0.5 for these older women. Table A gives 
0.3085. 


3.47: (a) 170,777/1,799,243 = 0.0949, or 9.49%. (b) There 
are 56,351 + 170,777 = 227,128 students with ACT score 
28 or higher. 227,128/1,799,243 = 0.1262, or 12. 62%, (c) If 
x is the ACT score, x > 28 corresponds to z > Be ae = 131; 
The proportion is 1 — 0.9049 = 0.0951, or 9.51%. 


3.49: (a) A histogram is provided and is roughly symmetric. 


2013 GSU Math SAT Scores 


400 + 
Mean 543.2 
StDev 61.69 
300 4 
> 
i>) 
<= 
S$ 
= 2004 
7) 
- 
Le 
100 4 
0) — 


(b) Mean = 543.2, Median =540, Standard deviation = 
61.69, Q; = 500, Q; = 580. The mean and median are close, 
and the distances from each quartile to the median are equal. 
These results are consistent with a Normal distribution. (c) If 
x is the score of a randomly selected GSU entering student, 
then we are assuming x has the N(543.2, 61.69) distribution. 
The proportion of GSU students scoring higher than 514 
corresponds to x > 514, or zy > “45284 = -0.47, or 1 - 
0.3192 = 0.6808, or 68.08%. (d) 350 one 510 or less, so 
2718 — 859 = 1859 entering GSU students scored higher than 
514, which is 1859/2718 = 0.6840, or 68.4%. The nominal 
Normal probability in part (c) fits the actual data well. 


3.51: (a) 14/548 = 0.0255 (2.55%) weighed less than 100 
pounds. x < 100 corresponds to z < “?gget8 = -1.26. 
Using Table A, the area is 0.1038 (10.38%). (b) 33/548 = 
0.0602 (6.02%) weighed more than 250 pounds. x > 250 cor- 
responds to z > *°ggg¢!?® = 1.81. Using Table A, about 1 — 
0.9649 = 0.0351 (3.51%) would weigh more than 250 pounds. 
(c) The Normal distribution model predicts 10.38% of women 
weigh less than 100 pounds, while actually about 2.55% do. 
This is a substantial error since the Normal model also pre- 
dicts 3.51% of values more than 250, where we actually 
observed 6.02% more than 250. These data seem to be far 
from Normal. 


3.53: Because the quartiles of any distribution have 
50% of observations between them, place the flags so that 
the reported area is 0.5. The closest the applet gets is an 
area of 0.4978, between —0.671 and 0.671. The quartiles 
of any Normal distribution are about 0.670 above and 
below p. 


Chapter 4 Scatterplots and Correlation 


4.1: (a) Explanatory: number of lectures attended; 
response: grade on final exam. (b) Explanatory: time exercis- 
ing; response is calories burned. (c) Explanatory: Time spent 
online using Facebook; GPA is the response. (d) Explore the 
relationship. 


4.3: For example: weight, sex, other food eaten by the stu- 
dents, type of beer (light, imported, . . .). 


4.5: Outsource percent is the explanatory variable. These 
data do not support concerns of the critics. 


Delay vs. Outsource 
254 


Delay 


10+ 


0 20 40 60 80 
Outsource 


4.7: One could consider there to be two outliers; Frontier 
has an unusually low outsourcing percent and a high delay 
percent; Hawaiian has a very high outsourcing percent and a 
very low delay percent. Without Frontier, there would be a 
decreasing relationship that is approximately linear and mod- 
erately strong. Without Hawaiian, there is really no relation- 
ship, but two sets of points: five airlines with high delays, 
which don’t seem to depend on outsourcing, and three airlines 
with low delay percentages that again don’t seem to depend 
on outsourcing. 


4.9: (a) Caution counties are marked with squares, the oth- 
ers with circles. (b) For both types of counties, there appears to 
be no relationship. The caution counties form a band in the 
lower portion of the graph, while the non-caution counties 
form a band in the upper portion. 
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Homicide vs. Suicide Rates for 26 Ohio Counties 
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4.11: r would not change; units do not affect correlation. 


4.13: x = 50 mph, s, = 15.8114 mph, y = 26.8 mpg, and 
sy = 2.6833 mpg. Refer to the table: note that r = 0/4 = 0. 
r = O because these variables do not have a straight-line 
relationship. Correlation only measures the strength and 
direction of a linear relationship between two variables. 
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4.15: (c) The association should be negative (if slower 
reaction times mean less time to death). 


4.17: (a) 0.9. Without the outlier, there is a strong positive 
linear relationship. 
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4.19: (c) Acorrelation close to 0 might arise from a scatter- 
plot with no visible pattern, but there could be a nonlinear 
pattern. See Exercise 4.13, for example. 


4.21: (a) 1. 
Examl — 10. 


The line would exactly be Exam2 = 


4.23: (b) Computation with calculator or software gives 
r = 0.298. 


4.25: (a) Overall, there is a slightly negative association 
between these variables. (b) There is general disagreement— 
low BRFSS scores correspond to greater happiness, and these 
are associated with higher-ranked states (the least happy 
states, according to the objective measure). (c) It is hard to 
declare any of the data values as “outliers.” 


4.27: (a) The scatterplot suggests a strong positive linear 
association. 


Ebola Spread in Gorillas 
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(b) r = 0.9623. This is consistent with the pattern seen in 
part (a). (c) Correlation would not change, since it does not 
depend on units. 


4.29: (a) The scatterplot is shown; note that neural activity 
is explanatory. (b) The association is moderately strong, posi- 
tive, and linear. The outlier is in the upper right corner (behav- 
ioral score 155.2). (c) For all points, r = 0.8486. Without the 
outlier, r = 0.7015. The correlation is greater with the outlier 
because it fits the pattern of the other points. 


Does Behavior Respond to Brain Activity? 
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4.31: (a) The scatterplot is shown. (b) The plot suggests that 
there is a strong relationship between alcohol intake and 
relative risk of breast cancer. It seems that type of alcohol has 
nothing to do with the increase since the same pattern and 
rate of increase is seen for both groups. 


Alcohol Consumption and Breast Cancer Risk 
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4.33: (a) The plot suggests that “Good” weather reports tend 
to yield higher tips. (b) The explanatory variable is categorical, 
so r cannot be used. 
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4.35: (a) The scatterplot is provided. Set B has stretched out 
the x-values, but the pattern is still the same. (b) Units do not 
impact correlation. For both data sets, r = 0.298. 


Percent Body Fat vs. Preferred Salt 
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4.37: (a) Small-cap stocks have a lower correlation with 
municipal bonds, so the relationship is weaker. (b) She should 
look for a negative correlation (although this would also mean 
that this investment tends to decrease when bond prices rise). 


4.39: (a) Because sex has a nominal scale, we cannot com- 
pute r. There is a strong association between sex and income. 
(b) r = 1.09 is impossible, because r is restricted to be between 
—1and 1. (c) Correlation has no units, so “r = 0.63 centime- 
ter” is incorrect. 


4.41: (a) The correlation will be closer to 1. (b) Answers will 
vary, but r will decrease, and can be made negative by dragging 
the point down far enough. 


4.43: PLAN: To investigate global warming, create a scatter- 
plot and look for an increasing (positive) pattern. SOLVE: The 
plot suggests that temperatures have been increasing overall, 
but there seems to have been a slowing in the past few years; this 
graph looks curved. Correlation may not be a useful measure 
here; r = 0.6334. CONCLUDE: Over time, average global tem- 
peratures have increased, but the increase may not be linear. 
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4.45: PLAN: Plot heat loss from the beak against tempera- 
ture. Compute the correlation, if the relationship is reasonably 
linear. SOLVE: The plot is shown. There is a reasonably strong 
linear relationship, so we can use correlation to describe this 
relationship. r = 0.9143. CONCLUDE: When the tempera- 
ture increases, a greater percentage of total heat loss is due to 
beak heat loss. That is, the beak plays a more important role in 
cooling down the toco toucan as the weather becomes hotter. 


Toco Toucan's Beak and Cooling 
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4.47: PLAN: Begin with a scatterplot, and compute the cor- 
relation if appropriate. SOLVE: A scatterplot shows a moder- 
ately strong, positive, linear association. The point at the 
upper right (5.3, 3.4) may be an outlier. This point seems to 
make the linear relationship appear more positive. Including 
the possible outlier, r = 0.564. If the outlier is omitted, r = 
0.4406. CONCLUDE: The positive association supports the 
idea that squirrel populations increase when the pine cone 
supply is higher in the previous autumn. However, the rela- 
tionship is somewhat weak; squirrels in the Yukon may have 
other good food sources. 


3.55 
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Chapter 5 Regression 


5.1: (a) The slope is 1.033. On average, highway mileage 
increases by 1.033 mpg for each additional 1 mpg change in 
city mileage. (b) The intercept is 6.785 mpg. This is the high- 
way mileage for a nonexistent car that gets 0 mpg in the city. 
(c) For a car that gets 16 mpg in the city, we predict 6.785 + 
(1.033) (16) = 23.31 mpg. For a car that gets 28 mpg in the 
city, we predict 6.785 + (1.033)(28) = 35.71 mpg. (d) The 
regression line passes through all the points of prediction. 
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5.3: (a) The slope is 1021. For each year since 2000, forest loss 
averages about 1021 km/. (b) In square meters, the slope would 
be 1021 X 10° = 1,021,000,000, a loss of 1 billion square meters 
per year (on average). In thousands of square kilometers, the 
slope would be 1.021; a loss of a bit more than 1000 km? per 
year (on average). Units matter in regression. 


5.5: (a) The scatterplot (with the regression line) is shown. This 
relationship is certainly weak. (b) Suicide = 11.125 + 0.195* 
Homicide (from software). (c) The slope means that for every 
suicide (per 100,000 people), there are about 0.195 homicides 
(per 100,000 people) in these Ohio counties. (d) 11.125 + 
0.195(8.0) = 12.685 suicides. 
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Homicide 


5.7: (a) The scatterplot is provided, with the regression line. 
y = 1.0284 — 0.004498x. The plot suggests a slightly curved 
pattern, not a strong linear pattern. A regression line is not 
useful for making predictions. (b) r? = 0.031. This confirms 
what we see in the graph: the regression line does a poor job 
summarizing the relationship. Only about 3% of the variation 
in growth rate is explained by the least-squares regression on 
difference in begging intensity. 
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5.9: (a) Plot is provided following, top. (b) The pattern 
is curved, so linear regression is not appropriate. (c) For x = 
10, y = 11.058 — 0.01466(10) = 10.91, so the residual is 
21.00 — 10.91 = 10.09. The sum of the residuals is — 0.01. (d) 
The first two and last four residuals are positive, and those in 
the middle are negative. Plot following, bottom. 
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5.11: (a) Any point that falls exactly on the regression line 
will not increase the sum of squared vertical distances, so the 
regression line does not change. Any other line (even if it 
passes through this new point) will necessarily have a higher 
total sum of squared prediction errors. r increases because the 
new point reduces the relative scatter about the regression 
line. (b) Influential points are those whose x-coordinates are 
outliers. The regression line will “follow” an influential point if 
it is moved up or down in the y direction. 


5.13: (a) Hawaiian Airlines is the point identified with 
“H.” Since this point is an outlier and falls outside the x range 
of the other data points, it is influential. (b) With the outlier, 
r = —0.488. If the outlier is deleted, r = —0.241. (c) The two 
regression lines are plotted. The line based on the full data 
set has been pulled down toward the outlier, indicating that 
the outlier is influential. Based on the complete data set, 
y = 21.815 — 0.12851x. Using this, when x = 79.1, we predict 
11.65% delays. Omitting the outlier, y = 19.460 — 0.05528x, 
so we predict 15.09% delays. The outlier impacts predictions 
because it impacts the regression line. 
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5.15: (a) } = —44.831 + 0.1323x (or Kills = —44.831 + 
0.1323 Boats). (b) If890,000 boats are registered, x = 890, and 
y = —44.831 + (0.1323)(890) = 72.92 manatees killed. The 
prediction seems reasonable, as long as conditions remain the 
same, because “890” is within the space of observed values of 
x. (c) Ifx = 0, then we would “predict” —44.831 manatees to 
be killed by boats. This is absurd, because it is clearly impossi- 
ble. This illustrates the folly of extrapolation ...x = 0 is well 
outside the range of observed values of x on which the regres- 
sion line was based. 


5.17: Possible lurking variables include the IQ and socioeco- 
nomic status of the mother. These variables are associated with 
smoking in various ways, and are also predictive of a child’s IQ. 


5.19: One example might be that men who are married, 
widowed, or divorced may be more “invested” in their careers 
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than men who are single. There is still a feeling of societal pres- 
sure for a man to “provide” for his family. 


5.21: (b) 0.2. Consider two points on the regression line— 
say, (90, 4) and (130, 11). The slope of the line segment con- 
necting these points is g—¢5 = 7/40 = 0.175. 


5.23: (a) y = 1000 + 100x 

5.25: (c) 16 cubic feet 

5.27: (a) The slope of the line is positive. 

5.29: (a) y = 24.2 + 6.0x 

5.31: (a) The least-squares regression line says that increas- 


ing the size of a diamond by 1 carat increases its price by 
3721.02 Singapore dollars, on average. (b) A diamond of size 0 
carats would have a predicted price of 259.63 Singapore dol- 
lars. This is probably an extrapolation, since the data set on 
which the line was constructed almost certainly had no rings 
with diamonds of size 0 carats. 


5.33: (a) y = 0.919 + 2.0647x. For every degree Celsius, the 
toucan will lose about 2.06% more heat through its beak. (b) 
y = 0.919 + 2.0647(25) = 52.5. Ata temperature of 25 degrees 
Celsius, we predict a toucan to lose 52.5% more heat through its 
beak, on average. (c) r= Vr? = V0.836 = 0.914. Correlation is 


positive because the regression line has a positive slope. 


5.35: (a) b = 15,/s, = (0.5)(§) = 0.1, and a=¥—bx = 
75 — (0.1)(280) = 47. The regression equation is 
y = 47+ 0.1x. Each point of pre-exam total score means 
an additional 0.1 points on the final exam, on average. 
(b) ¥ = 47 + (0.1)(300) = 77. (c) Withr = 0.5, r = (0.5)? 
0.25, so the regression line accounts for only 25% of the vari- 
ability in student final exam scores; the regression line doesn’t 
predict final exam scores very well. 


5.37: (a) y = 28.037 + 0.521x. r = 0.555. The plot is pro- 
vided. (b) y = 28.037 + (0.521)(70) = 64.5 inches (rounded). 
This prediction isn’t expected to be very accurate because the 
correlation isn’t very large; r’ = (0.555)? = 0.308. 
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5.39: (a) y = 31.934 — 0.304x. (b) On the average, for 
each additional 1% increase in returning birds, the number 
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of new birds joining the colony decreases by 0.304. (c) When 
x = 60, we predict y = 13.69 new birds. 


Sparrowhawk Colonies 


WAS) =| 
20.0 4 
AES) =| 
15.0 4 
125) 5 
10.0 + 


Number of New Adults 
N 
Ww 


vi 
@) 
! 


40 50 60 70 80 
Percent Returning 


5.41: (a) The outlier is in the upper-right corner. (b) With 
the outlier omitted, y = 0.586 + 0.00891x. (This is the solid 
line in the plot.) (c) The line does not change much because 
the outlier fits the pattern of the other points; r changes 
because the scatter (relative to the line) is greater with the 
outlier removed. (d) The correlation changes from 0.8486 
(with all points) to 0.7015 (without the outlier). With all 
points included, the regression line is y = 0.585 + 0.0879x 
(nearly indistinguishable from the other regression line). 


Brains and Losses 
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Minitab output - all points 


The regression equation is Behave = 0.585 + 
0.00879 Neural 


Predictor Coef SE Coef 7 P 
Constant 0.58496 0.07093 8325 0.000 
Neural 0.008794 0.001465 6.00 0.000 


Minitab output - outlier removed 


The regression equation is Behave = 0.586 + 
0.00891 Neural 


Predictor Coef SE Coef 7 P 
Constant 0.58581 0.07506 7.80 0.000 
Neural 0.008909 0.002510 33:55 0.004 


5.43: (a) The two unusual observations are indicated on the 
scatterplot. (b) The correlations are 


r, = 0.4819 (all observations) 
r, = 0.5684 (without Subject 15) 
+3 = 0.3837 (without Subject 18) 


Both outliers change the correlation. Removing Subject 15 
increases r because its presence makes the scatterplot less lin- 
ear. Removing Subject 18 decreases r because its presence 
decreases the relative scatter about the linear pattern. 


Managing Diabetes 
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5.45: The scatterplot with regression lines added is given. 
The equations are 


= 66.4 + 10.4x (all observations) 
= 69.5 + 8.92x (without #15) 
= 52.3 + 12.1x (without #18) 


2 o2o’> 


While the equation changes in response to removing either 
subject, one could argue that neither one is particularly influ- 
ential, because the line moves very little over the range of x 
(HbA) values. Subject 15 is an outlier in terms of its y-value; 
such points are typically not influential. Subject 18 is an outlier 
in terms of its x-value, but it is not particularly influential 
because it is consistent with the linear pattern suggested by the 
other points. 
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5.47: The correlation would be smaller. Individual weight 
will vary much more than the average weight for a given 


height. 


5.49: Responses will vary. For example, students who choose 
the online course might have more self-motivation or have 
better computer skills. 


5.54: (a) MathSAT = 471.82 + 0.00048(TeachSal). For 
these states, increasing the average teacher salary increases 
the mean Math SAT score by 0.00048 points, on average. 
(b) MathSAT = 472.8 + 0.0020(TeachSal). For these states, 
increasing the average teacher salary increases the mean 
Math SAT score by 0.0020 points, on average. (c) The slopes 
here have opposite signs from that found in Exercise 5.50. 
Consideration of a third (lurking) variable changed the 
relationship. 


5.53: Here is a simple example to show how this can hap- 
pen: suppose that most workers are currently 30 to 50 years 
old. Suppose further that each worker’s current salary is 
his/her age (in thousands of dollars). Over the next 10 years, 
all workers age, and their salaries increase. Suppose every 
worker’s salary increases by between $4000 and $8000. Then 
every worker will be making more money than he/she did 10 
years before, but less money than a worker of that same age 
10 years before. 


5.55: For a_ player who shot 80, we predict 
y = 56.47 + (0.243)(80) = 75.91. For a player who shot 70, 
y = 56.47 + (0.243)(70) = 73.48. The player who shot 80 
the first round (worse than average) is predicted to have a 
worse-than-average score the second round, but better 
than the first round. The player who shot 70 the first round 
(better than average) is predicted to do better than average 
in the second round, but not as well (relatively) as in the 
first round. Both players are predicted to “regress” to the 
mean. 


5.57: A regression line is appropriate only for data set 
(b). For data set (c), the point not in the vertical stack is 
very influential—the stacked points alone give no indication 
of slope for the line. The curved relationship exhibited by 
(d) clearly indicates that predictions based on a straight line 
are not appropriate. 


5.59: PLAN: We construct a scatterplot (with beaver stumps 
as the explanatory variable), and, if appropriate, find the 
regression line and correlation. SOLVE: The scatterplot shows 
a positive linear association. Regression seems to be appropri- 
ate; y = —1.286 + 11.89x. The straight-line relationship 
explains r? = 83.9% of the variation in beetle larvae. CON- 
CLUDE: The strong positive association supports the idea that 
beavers benefit beetles. 
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Do Beavers Help Beetles? 


Larvae 


Stumps 


5.61: PLAN: We construct a scatterplot, with Forecast as the 
explanatory variable, and Actual as the response variable. If 
appropriate, we find the least-squares regression line. We con- 
sider the impact of the potential outlier (2005 season). SOLVE: 
The scatterplot shows a reasonable but not very strong linear 
relationship. In recent years, it seems that there is no relation- 
ship between forecast and actual (a flat line). The 2005 season 
is influential, pulling the regression line somewhat. We might 
consider deleting this point and fitting both lines. Deleting the 
point, we obtain the solid regression line, y = 2.753 + 0.7964 
Forecast when the original equation is y = 1.668 + 0.920 
Forecast. If the forecasts were perfect, the intercept of this line 
would be 0, and the slope would be 1. Deleting the 2005 sea- 
son, r = 0.628, and r? = 39.4%. Even after deleting the outlier, 
the regression line explains only 39.4% of variation in number 
of hurricanes. CONCLUDE: Predictions using the regression 
line are not very accurate. However, there is a positive associa- 
tion... so a forecast of many hurricanes may reasonably be 
expected to forebode a heavy season for hurricanes. 


Dr. Gray's Observed vs Forecast 
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5.63: PLAN: We plot marathon times by year for each sex, 
using different symbols. If appropriate, we fit least-squares 
regression lines for predicting time from year for each gender. 
We then use these lines to guess when the times will agree. 
SOLVE: The scatterplot is provided with regression lines 
plotted. The regression lines are: 


For men: y = 66,072 — 29.535x 
For women: y = 182,976.15 — 87.73x 
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Although the lines appear to fit the data reasonably well (and the 
regression line for women would fit better if we omitted the outlier 
associated with year 1926), this analysis is inviting you to extrapo- 
late, which is never advisable. CONCLUDE: Using the regression 
lines plotted, we might expect women to “outrun” men by the year 
2009. Omitting the outlier, the line for women would decrease 
more steeply, and the intersection would occur sooner, by 1995. 
We'll note that as of July 2014, this has not happened. 


Marathon Record Times 
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Chapter 6 Two-Way Tables 


6.1: (a) 736 + 450 + 193 + 205 + 144 + 80 = 1808 people. 
736 + 450 + 193 = 1379 played video games. (b) (736 + 
205)/1808 = 0.5205 = 52.05% earned As and Bs. Do this for 
all three grade levels. The complete marginal distribution for 
grades is 


GRADE PERCENT 
As and Bs 52.05% 
Cs 32.85% 
Ds and Fs 15.10% 


32.85% + 15.10% = 47.95% received a grade of C or lower. 


6.3: There are 736 + 450 + 193 = 1379 players. Of these, 
736/1379 = 53.37% earned As or Bs. There are 205 + 144 + 
80 = 429 nonplayers. Of these, 205/429 = 47.79% earned As 
or Bs. Continuing in like manner, the conditional distributions 
of grades follows: 


GRADES PLAYERS NONPLAYERS 
As and Bs 53.37% 47.79% 
Cs 32.63% 33.57% 
Ds and Fs 14.00% 18.65% 


If anything, players have slightly higher grades than nonplay- 
ers, but this could be due to chance. 


6.5: Two examples are shown. In general, choose a to be any 
number from 10 to 50, and then all the other entries can be 
determined. 

30 20 50 0 

30 20 10 40 


6.7: (a) For Rotorua district, 79/8889 = 0.0089, or 0.9%, of 
Maori are in the jury pool, while 258/24,009 = 0.0107, or 1.07%, 
of the non-Maori are in the jury pool. For Nelson district, the 
corresponding percents are 0.08% for Maori and 0.17% for non- 
Maori. In each district, the percent of non-Maori in the jury pool 
exceeds the percent of Maori in the jury pool. 


MAORI NON-MAORI 
In jury pool 80 314 
Not in jury pool 10,138 56,353 
Total 10,218 56,667 


(b) Overall, 80/10,218 = 0.0078, or 0.78%, of Maori are in the 
pool, while 314/56,667 = 0.0055, or 0.55%, of non-Maori are 
in the pool. Overall the Maori have a larger percent in the jury 
pool, but in each region they have a lower percent in the jury 
pool. (c) The reason is that the Maori constitute a large pro- 
portion of Rotorua’s population, while in Nelson they are a 
small minority community. 


6.9: (b) 150 teens in schools that forbid cell phones 
6.11: (a) the marginal distribution of school permissiveness 


6.13: (c) the conditional distribution of the frequency that 
a teen brings a cell phone to school among the schools that 
forbid cell phones 


6.15: (b) the conditional distribution of school permissive- 
ness among those who brought their cell phone to school every 
day 


6.17: (b) an example of Simpson’s paradox 


6.19: For each type of injury (accidental, not accidental), the 
distribution of ages follows. 


Weight Lifting Injuries by Type and Age 
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ACCIDENTAL NOT ACCIDENTAL 
8-13 19.0% 4.0% 
14-18 42.2% 35.8% 
19-22 15.4% 20.8% 
23-30 23.4% 39.4% 


Among accidental weight-lifting injuries, the percentage of 
younger lifters is larger, while among the injuries that are not 
accidental, the percentage of older lifters is larger. 


6.21: The percent of single men with no income is 
513/4938 = 0.1039, or 10.39%. The percent of men with no 
income that are single is 513/1918 = 0.2675, or 26.75%. 


6.23: (a) We need to compute percents to account for more 
married men than single men. (b) A table is provided; descrip- 
tions of the relationship may vary. For example, married men 
made up less of the no income distribution compared to the 
$100,000 and up distribution. 


SINGLE MARRIED DIVORCED WIDOWED 
No income 26.75% 51.87% 20.07% 1.30% 
$100,000 and up 4.76% 85.43% — 8.99% 0.81% 


6.25: (a) The two-way table of race (white, black) versus 
death penalty (death penalty, no death penalty) follows: 


WHITE DEFENDANT BLACK DEFENDANT 
Death penalty 19 17 
No death penalty 141 149 


(b) For black victims, white defendants were given the 
death penalty in 0/9 = 0, or 0%, of cases. Black defendants 
were given the death penalty in 6/103 = 0.058, or 5.8%. 
For white victims, white defendants were given the death 
penalty in 19/151 = 0.126, or 12.6%, of cases, and black 
defendants were given the death penalty 11/63 = 0.175, or 
17.5%, of the time. For both victim races, black defendants 
are given the death penalty relatively more often than 
white defendants. However, overall, 19/160 = 0.119, or 
11.9%, of white defendants got the death penalty, while 
17/166 = 0.102, or 10.2%, of black defendants got the 
death penalty. (c) For white defendants, (19 + 132)/(19 + 
132 +0+ 9) = 0.9438 = 94.4% of victims were white. For 
black defendants, (11 + 52)/(11 +52 + 6 + 97) = 0.3795, 
or 37.95%, of victims were white. The death penalty was 
predominantly assigned to cases involving white victims: 
14.0% of all cases with a white victim, and only 5.5% of all 
cases with a black victim, had a death penalty assigned. 
Because most white defendants’ victims are white, and 
cases with white victims carry additional risk of a death 
penalty, white defendants are being assigned the death pen- 
alty more often overall. 
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6.27: PLAN: Find and compare the conditional distribu- 
tions of outcome for each treatment. SOLVE: The percent- 
ages for each column are provided. For example, for Chantix, 
the percentage of successes is 155/(155 + 197) = 0.4403, or 
44.0%. 


CHANTIX BUPROPION PLACEBO 


44.0% 29.5% 17.7% 


Percent not smoking in weeks 9-12 


CONCLUDE: Clearly, a larger percent of subjects using 
Chantix were not smoking during weeks 9-12, compared with 
results for either of the other treatments. 


6.29: PLAN: Calculate and compare the conditional distri- 
butions of sex for each degree level. SOLVE: For example, 
646/(646 + 383) = 0.6278, or 62.78% of associate’s degrees 
went to women. The table shows the percent of women at 
each degree level, which is all we need. CONCLUDE: Women 
constitute a substantial majority of associate’s, bachelor’s, and 
master’s degrees, and a small majority of doctor’s and profes- 
sional degrees. 


DEGREE % FEMALE 
Associate’s 62.78% 
Bachelor’s 57.88% 
Master’s 61.94% 
Doctor’s and Professional 53.53% 


6.31: PLAN: Find and compare the conditional distributions 
for health for each group. SOLVE: The table provides the per- 
cent of subjects with various health outlooks for each group. 
CONCLUDE: Clearly, the outlooks of current smokers are 
generally bleaker than those of current nonsmokers. Much 
larger percentages of nonsmokers reported being in “excellent” 
or “very good” health, while much larger percentages of smok- 
ers reported being in “fair” or “poor” health. 


HEALTH OUTLOOK 
EXCELLENT VERYGOOD GOOD FAIR’ POOR 
Current smoker 6.2% 28.5% 35.9% 22.3% 7.2% 
Current nonsmoker 12.4% 39.9% 33.5% = 14.0% 0.3% 


6.33: Because the numbers of students who use (or do not 
use) medications are different, we find the conditional distri- 
butions of those who do and do not use medications. Those 
who use medications are less likely to have optimal sleep. This 
is a case where one would not want to ascribe causation: Do 
those who use medications to stay awake have poor sleep qual- 
ity because they use the medication, or do they use the medi- 
cations to stay awake because they had poor sleep quality 
before using them? 
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SLEEP QUALITY 
OPTIMAL BORDERLINE POOR 
Use medications 21.3% 30.5% 48.3% 
Do not use medications 38.2% 26.7% 35.2% 


Chapter 7 Exploring Data: Part | Review 
7.1: (c) 
7.3: (d) 
7.5: (c) 
7.7: (c) 
7.9: (c) 


7.11: (a) centimeters; (b) centimeters; (c) centimeters; 
(d) grams? 


7.13: (d) 43 of 51 observations at least 20 


7.15: (c) About 54% use a search engine at least once a 
day; an additional 15-16% use one at least three times per 
week. 


7.17: (a) 0.800 — 3(0.078) = 0.566 to 0.800 + 3(0.078) = 
1.034 mm (b) 0.878 is la above the mean; so about 16% 
(15.87%) 


7.19: (a) Minimum = 7.2, Q; = 8.5, M = 9.3, Q; = 10.9, 
Maximum = 12.8 (b) M = 27 (c) 25% of values exceed 
Q; = 30 (d) Yes. Virtually all Torrey pine needles are longer 
than virtually all Aleppo pine needles. 


7.21: (a) 

7.23: (c) 

7.25: (d) 

7.27: (d) 

7.29: (d) 

7.31: (c) 

7.33: (a) 

(a) No (b) r’ = 0.64, or 64% 

7.37: (a) 8.683 kg (b) 10.517 kg (c) Such a comparison is 


unreasonable because the lean group is less massive, and 
therefore would be expected to burn less energy on average. 
(d) See scatterplot. (e) It appears that the rate of increase 
in energy burned per kilogram of mass is about the same for 
both groups. The obese monkeys burn less energy than the 
lean monkeys, because their points tend to be below the 
others. 


7.35: 


Comparing Energy Spent for Lean and Fat Monkeys 
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7.39: (a) 190/8474 = 0.0224, or 2.24% (b) 633/8474 = 
0.0747, or 7.47% (c) 27/633 = 0.0427, or 4.27% (d) 4621/8284 
= 0.5578, or 55.78% (e) The conditional distribution of CHD 
for each level of anger is tabulated below. The result for the 
high anger group was computed in part (c). Clearly, angrier 
people are at greater risk of CHD. 


LOW ANGER MODERATE ANGER HIGH ANGER 


1.70% 2.33% 4.27% 

7.41: The time plot shows a lot of fluctuation from year to 
year, but also shows a recent increase: prior to 1972, the dis- 
charge rarely rose above 600 km’, but since then, it has 
exceeded that level more than half the time. A histogram or 


stemplot cannot show this change over time. 


Mississippi River Discharge 
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7.43: (a) The plot is provided. (b) ¥ = 144.79 — 0.0659x. 
The slope is negative, suggesting that the ice breakup day is 
decreasing (by about 0.0659 days per year). (c) The regres- 
sion line is not very useful for prediction, as it accounts for 
only about 9% (r? = 0.0903) of the variation in ice breakup 
time. 


Tanana River Tripod Falling Dates 
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7.45: (a) A histogram is provided. (b) Luxembourg is the 
outlier with average starting salary $65,171. Otherwise, the 
distribution is fairly symmetric. (c) x = $27, 381.4 and s = 
$9651.78. The five-number summary is Min = $9,526, Q, = 
$23,130, M = $28,328, Q; = $32,629, Max = $46,456. 
(d) The U.S. average starting salary ($36,858) is above the 
third quartile. Only six countries pay their teachers a higher 
starting salary. 


2012 Teacher's Salaries Around the World 


Frequency 


10000 20000 30000 40000 50000 60000 70000 
Salary (US$) 


7.47: PLAN: Graph the data and compute appropriate 
numerical summaries. SOLVE: A stemplot is shown. The dis- 
tribution seems to be fairly Normal apart from a high outlier of 
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50°. The five-number summary is preferred because of the out- 
lier: Min = 13°, Q, = 20°, M = 25°, Q; = 30°, Max = 50°. 
CONCLUDE: Student descriptions of the distribution will 
vary. Most patients have a deformity angle in the range of 15° 
to 35°. 
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7.49: PLAN: We examine the relationship with a scatterplot 
and (if appropriate) correlation and regression line. SOLVE: MA 
angle is the explanatory variable. The scatterplot shows a moder- 
ate to weak positive linear association, with one clear outlier 
(HAV angle 50°). r = 0.302, and y = 19.723 + 0.3388x. CON- 
CLUDE: MA angle can be used to give estimates of HAV angle, 
but the variability is so large that estimates would not be very 
reliable. The linear relationship explains only 17 = 9.1% of the 
variation in HAV angle. 


will MA Predict HAV? 
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7.51: PLAN: We will examine the relationship with a scatter- 
plot and (if appropriate) correlation and regression line. SOLVE: 
The scatterplot, shown with the line y = 70.44 + 274.78x, 
shows a moderate, positive linear relationship. The linear rela- 
tionship explains about r’ = 49.3% of the variation in gate veloc- 
ity. CONCLUDE: The regression formula might be used as a rule 
of thumb for new workers to follow, but the large variability in the 
scatterplot suggests that there may be other factors that should be 
taken into account. 
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Metal Casting 
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7.53: (a) The scatterplot of 2003 returns against 2002 
returns shows (ignoring the outlier) a strong negative associa- 
tion. (b) For all 23 points, r = —0.616; with the outlier 
removed, r = —0.838. The outlier deviates from the linear 
pattern of the other points; removing it makes the negative 
association stronger, and so r moves closer to —1. (c) Regres- 
sion formulas are given. The first line is solid; the second is 
dashed. The line must pivot up toward Fidelity Gold in order 
to minimize the sum of squares for all 23 deviations. Fidelity 
Gold is very influential. 


Fidelity Sector Funds Returns 
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r EQUATION 
All 23 funds —0.616 y = 31.117 — 0.4132x 
Without Gold —0.838 y = 21.462 — 0.8403x 


7.55: (a) Fish catch is the explanatory variable. The point for 
1999 is at the bottom of the plot. (b) Correlations are given in 
the table provided. The outlier decreases r because it weakens 
the strength of the association. (c) The two regression lines are 


given; the solid line uses all points, while the dashed line omits 
the outlier. The effect of the outlier on the line is small: there 
are several other years with similar changes in bushmeat bio- 
mass. Also, this year was not particularly extreme in the 
amount of fish caught. 


Scatterplot of Change vs. Fish 
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Chapter 8 Producing Data: Sampling 


8.1: (a) The population is (all) college students. (She wanted 
to know about all college students.) (b) The sample is the 
104 students at the researcher’s college who returned the 
questionnaire. Because she only has her own college to 
sample from, the population she can make conclusions about is 
students at her college. 


8.3: (a) The population is all users of the software. Unless 
their market is primarily educational, the 1100 individuals 
(mostly faculty) will not represent the population. (b) The 
sample is the 186 people who completed the survey. 


8.5: Because all the students surveyed are enrolled in a spe- 
cial senior honors class, these students may be more likely to 
be interested in joining the club. The direction of bias is to 
overestimate the proportion of all psychology majors willing 
to pay to join this club. This is a convenience sample. 


8.7: Number from 01 to 26 alphabetically. With the applet: 
Population = 1 to 26, select a sample of size 5, then click 
Reset and Sample. With Table B, enter at line 141 and choose 
23 = Rodriguez, 12 = Gemayel, 16 = Ippolito, 25 = Sgambel- 
lone, and 02 = Ahmadiani. 


8.9: With the election soon, the polling organization wants 


to increase the accuracy of its results. Larger samples provide 
better information. 


8.11: Label the suburban townships from 01 to 30 alphabeti- 
cally. With Table B line 116, choose 14 = New Trier, 03 = 
Bloom, 10 = Lemont, 22 = Proviso, and 06 = Cicero. Next, 
label the Chicago townships from 1 to 8, down the columns. 
With Table B, enter at line 126 and choose 6 = Rogers Park, 
2 = Jefferson, and 7 = South Chicago. 


8.13: (a) The population is all physicians practicing in 
the United States. The sample size is n = 2379. If the 2379 
were randomly selected, we could draw conclusions, but 
there was too much nonresponse. (b) The nonresponse rate is 
sO = 97.62%. We don’t know the attitudes of the 
nonrespondents, so the results may not be credible. (c) They 
only received 2379 responses. 


8.15: (a) The sample was not randomly selected. (b) The 
true percentage is most likely lower. People who don’t listen 
often probably don’t visit the NPR Facebook page. 


8.17: (a) 
8.19: (b) 
8.21: (b) Plots are stratified by terrain. 
8.23: (c 
8.25: (b 


) In part (b) “O07” appears in the sample twice. 
) People over 65 are a subset of the original sample. 


8.27: The population is the 1000 envelopes stuffed during a 
given hour. The sample is the 40 envelopes selected. 


8.29: With the applet: Population = 1 to 287, select a sam- 
ple of size 20, then click Reset and Sample. Using Table B, 
number the area codes 001 to 287. Enter at line 122, and select 
area codes labeled 138, 159, 052, 087, and 275. 


8.31: The questions were worded very differently. The U.S. 
has the second Amendment that allows guns. Canada has very 
different gun laws; gun ownership is generally forbidden. Given 
these facts, opinions would differ in any case (even if the ques- 
tions were worded the same). 


8.33: Online polls, call-in polls, and voluntary response polls 
in general tend to attract responses from those who have strong 
opinions. On the other hand, there is no reason to believe that 
randomly chosen adults would overrepresent any particular 
group; those give a more reliable picture of public opinion. 


8.35: (a) Assign labels 0001 through 5024, enter the table at 
line 114, and select: 4514, 0381, 0202, 0915, and 1776. (b) 
More than 171 respondents have run red lights. We would not 
expect very many people to claim they have run red lights 
when they have not, but some people will deny running red 
lights when they have. 


8.37: (a) Each person has a 10% chance: 4 of 40 men, and 3 
of 30 women. (b) This is not an SRS because not every group 
of 7 people can be chosen; the only possible samples are those 
with 4 men and 3 women. 


8.39: (a) How the sample was obtained can contribute to 
bias in the results, if not done randomly and fairly. We need to 
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try to avoid undercoverage, for example. (b) Answers will vary. 
For example, exactly how the 655 Internet users were selected 
is not given. 


8.41: Sample separately in each stratum; that is, assign sepa- 
rate labels, choose the first sample, then continue on in the 
table to choose the next sample, etc. Beginning with line 112 
in Table B, we choose: 


FOREST TYPE LABELS PARCELS SELECTED 
Climax 1 O1 to 36 04, 11, 19, 35 

Climax 2 O1 to 72 27, 30, 57, 62, 56, 02, 06 
Climax 3 01 to 31 08, 02, 25 

Secondary Ol to 42 11, 17, 14, 29 


8.43: (a) Because 200/5 = 40, choose one of the first 40 
names at random. Beginning on line 120, the addresses 
selected are 35, 75, 115, 155, and 195. (Only the first number 
is chosen from the table.) (b) All addresses are equally likely; 
each has chance 1/40 of being selected. This is not an SRS 
because the only possible samples have exactly one address 
from the first 40, one address from the second 40, and so on. 
An SRS could contain any five of the 200 addresses in the 
population. 


8.45: (a) This design omits households without tele- 
phones, those with only cell phones, and those with unlisted 
numbers. Such households would likely be made up of poor 
individuals (who cannot afford a phone), those who choose 
not to have landline phones, and those who do not wish to 
have their phone numbers published. (b) Those with 
unlisted landline numbers would be included in the sam- 
pling frame. 


8.47: (a) The wording is clear, but will almost certainly be 
slanted toward a high positive response. (b) The wording is 
clear, and it makes the case for a national health care system, 
and so will slant responses toward “yes.” (c) This survey 
question is most likely to produce a response similar to: 
“Uhh ... yes? I mean, no? I’m sorry, could you repeat the 
question?” 


8.49: In Canada, as in many places, elected officials aren’t 
necessarily qualified to make decisions about statistics. Critics 
of the proposal are worried that the sample will not be repre- 
sentative of the population—presumably because people that 
fill out the optional long-form questions will be systematically 
different from those that don’t. Larger samples do not address 
such problems of bias. 


Chapter 9 Producing Data: Experiments 


9.1: (a) The explanatory variable is the amount of alcohol 
drunk. The response is heart disease. (b) Weight, sleep, exer- 
cise, etc., would be considered to be lurking (or confounding) 
variables. (c) Without an experiment, no causal relationship 
can be identified. 
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9.3: This is an observational study; it is not reasonable to 
conclude any cause-and-effect relationship. 


9.5: Subjects: the students. Factors: type of attack, and 
prime used. Treatments: four combinations of love thy neighbor 
prime, or eye-for-an-eye prime and attack on military or 
cultural/educational target. Response variable: rating of U.S. 
reaction to attack. 


PRIME USED 
LOVE-THY-NEIGHBOR 


EYE-FOR-AN-EYE 


Military 1 2 
Cultural 3 4 


Target 


9.7: Making a comparison between the treatment group and 
the percent finding work last year is not helpful. Over a year, 
many things can change. In order to draw conclusions, we need 
to make the $500 bonus offer to some people and not to others 
during the same time period, and compare the two groups. 


9.9: (a) Diagram follows. (b) If using Table B, label 01 to 36 
and take two digits at a time. 


Group 1 Treatment 1 


i Added water in 
/ P January to March ‘ 
Treatment 2 Measure 


Random Group 2 i : 
assignment ee plots —» aust water in ——> biomass, 
pril to June etc. 
Group 3 Treatment 3 


6 plots No added water 

9.11: In a controlled scientific study the effects of factors 
other than the nonphysical treatment (e.g., the placebo effect, 
differences in the prior health of the subjects) can be elimi- 
nated or accounted for, so that the differences in improvement 
observed between the subjects can be attributed to the differ- 
ences in treatments. 


9.13: (a) The researchers did not alter the diets. (b) Such 
language is reasonable because with observational studies; no 
“cause and effect” conclusion would be reasonable. 


9.15: “Lack of blindness” means that the experimenter 
knows which subjects were taught to meditate. He or she 
may have some expectation about whether meditation will 
lower anxiety; this could (unconsciously) influence the end- 
of-month assessment. 


9.17: (a) The explanatory variable is the font; the response is 
the degree of agreement with the statement about healthiness. 
(b) Completely randomized design: 50 students are randomly 
assigned to each of the two fonts, and then rate the healthiness 
of the product. (c) Matched pairs design: each student sees both 
fonts in random order, then rates the product’s perceived 
healthiness for each font. 


9.19: (a) Life satisfaction is observed; no treatments were 
imposed. 


9.21: (b) All participating students had the same treatment. 
9.23: (b) 


9.25: (a) The researchers did not randomly assign where 
the people lived, so no treatments were actively imposed. 


9.27: (a) The choice should be made randomly. 


9.29: (a) This is an observational study; the subjects chose 
their own “treatments.” The explanatory variable is red meat 
consumption, and the response variable is whether or not a 
subject dies. (b) Many answers are possible. For example, 
smoking is known to increase the risk of cancer. (c) Many 
answers are possible. For example, how many servings of fruits 
and vegetables were consumed along with the red meat? 


9.31: This is an experiment, because the treatment is selected 
(randomly, we assume) by the interviewer. The explanatory 
variable (factor) is the level of identification, and the response 
variable is whether or not the interview is completed. 


9.33: (a) In an observational study, we simply observe sub- 
jects who live near highways and compare them with others 
who do not live near highways. In an experiment, we would 
assign where the subjects live. (b) Answers will vary. For 
example, those who live near highways might have less money 
and be attracted to the (possibly) lower housing costs near a 
highway. (c) Not feasibly. Who would want their housing situ- 
ation randomly assigned? 


9.35: (a) Diagram follows. (b) Assign labels 001 to 120. If 
using Table B, line 108 gives 090, 009, 067, 092, 041, 059, 040, 
080, 029, 091. 


Group 1 Treatment 1 
—" 60 schools —* Use camera > 
Random Compare 
assignment attendance 
NY Group2 Treatment2 ~-—* 
60 schools Control 


9.37: (a) The outline is shown. (b) From line 120, we choose 
subjects corresponding to the numbers 16, 04, 26, 21, 19, 07, 
22, 10, 25, 13, 15, 05, 29, 09, 08 for the first group, and the rest 
for group 2. Thus, the marijuana group consists of Mattos, 
Bower, Williams, Sawant, Reichert, DeVore, Scannell, Giri- 
unas, Stout, Kennedy, Mani, Burke, Zaccai, Fritz, and Fleming. 
All other subjects are assigned to the non-marijuana group. 
(c) This could be a double-blind experiment, assuming that 
subjects can’t distinguish between the types of marijuana 
smoked and the persons measuring output and earnings of sub- 
jects don’t know what kind of marijuana a subject smoked. 
Ensuring these conditions is very unlikely. 


Group 1 a Treatment 1 
_ 15 subjects Strong marijuana Campa 
Helio Migcan 
assignment 
Group 2 Treatment2 ~~” ee 
15 subjects Weak marijuana 


9.39: (a) There are two factors. The first factor is type of 
granola, and has two levels (regular and low-fat). The sec- 
ond factor is serving size label, and has three levels (2 serv- 
ings, 1 serving, and no label). There are six treatment 
combinations (regular granola at 2 servings, regular gra- 
nola at 1 serving, regular granola with no serving label, 
low-fat granola with 2 servings, low-fat granola with 1 serv- 
ing, and low-fat granola with no serving label). At 20 sub- 
jects per treatment, there were 120 subjects in the 
experiment. (b) 


GRANOLA TYPE 
SERVING SIZE LABEL REGULAR LOW-FAT 
2 servings 20 subjects 20 subjects 
1 serving 20 subjects 20 subjects 
No label 20 subjects 20 subjects 


9.41: The factors are pill type and spray type. “Double-blind” 
means that the treatment assigned to a patient was unknown 
to both the patient and those responsible for assessing the 
effectiveness of that treatment. “Placebo-controlled” means 
that some of the subjects were given placebos. Even though 
placebos possess no medical properties, some subjects may 
show improvement or benefits just as a result of participating 
in the experiment. 


9.43: (a) The subjects are randomly chosen (preferably peo- 
ple who like flavored water). Each subject tastes two cups of 
flavored water, in identical unlabeled cups. One contains 
MiO, the other the ready to drink product, in random order. 
Preference is the response variable. (b) We must assign 10 cus- 
tomers to get regular coffee first. Label the subjects 01 to 20. 
Starting at line 138, the “MiO first” group is: 16, 08, 15, 13, 17, 
04, 10, 19, 12, and 18. 


9.45: Each player will be put through the sequence (100 
yards, four times) twice—once with oxygen and once without. 
For each player, randomly determine whether to use oxygen on 
the first or second trial. Allow ample time (perhaps a day or 
two) between trials for full recovery. 


9.47: The diagram is shown. The last stage (“Observe heart 
health”) might be described in more detail. 


Group 1 Treatment 1 
100 subjects Just wine 
Random Group 2 Treatment 2 oe 
assignment 100 subjects Just beer 
health 
Group 3 Treatment 3 
100 subjects Just spirits 


9.49: Any experiment randomized in this way assigns all the 
women to one treatment and all the men to the other. Sex is 
completely confounded with treatment. If women and men 
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respond differently, the experiment will be strongly biased. The 
direction of the bias is random, depending on the coin toss. 


9.51: (a) “Randomized” means that patients were randomly 
assigned to receive either SAMe or a placebo. “Double-blind” 
means that the treatment assigned to a patient was unknown 
to both the patient and those responsible for assessing the 
effectiveness of that treatment. Even though placebos possess 
no medical properties, some subjects may show improvement 
or benefits just as a result of participating in the experiment; 
the placebos allow those doing the study to observe this effect. 
(b) Statistical significance means that the SAMe group had a 
difference in response (more had a positive response) that 
could not be attributed to chance. This means that it appears 
SAM¢e helps reduce depression when used with standard treat- 
ment. (c) Diagram follows. 


Group 1 Treatment 1 
F ee Observe 
39 patients SAMe : 
Random Ze SS change in 
assignment Hamilton 
~ Group 2 Treatment 2 ae 
34 patients Placebo cate 


Chapter 10 Data Ethics 


As the text states, “Most of these exercises pose issues for dis- 
cussion. There are no right or wrong answers, but there are 
more and less thoughtful answers.” We have not tried to supply 
answers for exercises that are largely matters of opinion. For 
that reason, only a few are provided. 


10.1: These six proposals are clearly in increasing order of 
risk. Most students will consider that option (a) qualifies as 
minimal risk, and most will agree that option (e) goes beyond 
minimal risk. 


10.3: Certainly the prospect of losing needed health services 
would induce most subjects to agree to participate. Likewise, 
an employer’s “voluntary” participation might be rewarded (or 
penalized) in job performance evaluations. Both of these could 
be termed coercive. 


10.11: Most students will see option (a) as allowable, option 
(b) as questionable even though the meetings are public, and 
option (c) as not allowable due to the psychologist “pretending 
to be converted.” 
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11.1: (c) hives with bees; hives with no bees; no hives 


11.3: (a) Label the students 01 through 35 alphabetically. 
(b) Starting on line 115, the selected students are 04 = Bower, 
17 = Huling, 32 = Vore, 22 = Newburg, 09 = Ding, and 
26 = Pulak. (c) The response variable is “How much I trust 
the Internet for health information.” 


11.5: (d) 


11.7: (a) No treatments were assigned by the researchers. 
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11.9: Many answers are possible. One possible lurking vari- 
able is “student attitude about the purpose of college” (stu- 
dents with a view that college is about partying, rather than 
studying may be more likely to binge drink and more likely to 
have lower grades). A correct example of a lurking variable 
must be a variable that simultaneously drives both “GPA” and 
“binge drinking” together. 


11.11: (d) 30-second once, 30-second twice, 45-second 
once, 45-second twice, 60-second once, and 60-second twice 


11.13: People who visit the NOVA Science Now website 
don’t represent American adults broadly. Those people taking 
the survey went out of their way to participate in this online 
poll, and they read pro and con arguments after watching a 
program about the issue. It seems reasonable to believe that 
these people understand the issues better than most American 
adults. 


11.15: (b) No treatments were assigned. Testing eyesight 
and IQ are not applying treatments. 


11.17: (a) This is the definition of informed consent. 


11.19: Placebos do work with real pain, so the placebo 
response tells nothing about physical basis of the pain. In 
fact, placebos work poorly in hypochondriacs. The survey is 
described in the April 3, 1979, edition of the New York Times. 


11.21: (a) increase (b) decrease (c) increase (d) decrease 


Note: The first and third statements make an argument in favor of 
a national health insurance system, while the second and fourth 
suggest reasons to oppose it. 


11.23: (a) The factors are storage method (three levels: 
fresh, room temperature for one month, refrigerated for one 
month) and preparation method (two levels: cooked immedi- 
ately or after one hour). There are therefore six treatments. 
The response variables are the tasters’ color and flavor ratings. 
(b) Randomly allocate n potatoes to each of the six groups, 
then compare ratings. (c) For each taster, randomly choose the 
order in which the fries are tasted. We’ll note that tasters may 
become confused with six different batches to taste, however. 


Cooked Wait 
immediately one hour 
Fresh 
Stored 
Refrigerated 


11.25: (a) This is an observational study; the subjects chose 
their own “treatments” (how much to drink). The explanatory 
variable is alcohol consumption, and the response variable is 
whether or not a subject dies. (b) Many answers are possible. 
For example, some nondrinkers might avoid drinking because 
of other health concerns. We do not know what kind of alco- 
hol (beer? wine? whiskey?) the subjects were drinking. 


Chapter 12 Introducing Probability 


12.1: In a large number of five-card poker hands, the 
fraction in which you will be dealt a straight flush is about 
1/64,974. It does not mean that exactly 1 out of 64,974 such 
hands would yield a straight flush. 


12.3: (a) There are 21 zeroes among the first 200 digits of the 
table (rows 101-105), for a proportion of 0.105. (b) Answers 
will vary, but more than 99% of all students should get between 
7 and 33 heads out of 200 flips when p = 0.1. 


12.5: (a) S = {lives on campus, lives off campus}. (b) S = 
{All numbers between and years}. (Choices 
of upper and lower limits will vary, due to characteristics of 
your institution.) (c) S = {0000, 0001, 0002,... , 9999}. 
(d) S = {A, B, C, D, F} (students might also include W, “+” 
and “—”, depending on your institution). 


12.7: Add 1 to each pair-total found in the solution to 
Exercise 12.6: S = {3, 4, 5, 6, 7, 8, 9}. Each of the 16 possible 
pairings is equally likely, so (for example) the probability of a 
total of 5 is 3/16, because 3 pairings add to 4 (and then we add 
1). The complete set of probabilities is shown. 


TOTAL PROBABILITY 
3 1/16 = 0.0625 
4 2/16 = 0.125 
5 3/16 = 0.1875 
6 4/16 = 0.25 
7 3/16 = 0.1875 
8 2/16 = 0.125 
9 1/16 = 0.0625 


12.9: (a) Event B specifically rules out obese subjects, 
so there is no overlap with event A. (b) A or B is the event 
“The person chosen is overweight or obese.” P(A or B) = 
P(A) + P(B) = 0.36 + 0.33 = 0.69. (c) P(C) = 1 — P(A orB) = 
1 — 0.69 = 0.31. 


12.11: (a) Disjoint. (b) Not disjoint. $300,000 is more 
than $100,000 and more than $250,000. (c) Disjoint. 3 + x 
cannot equal 3. 


12.13: (a) A = {4, 5, 6, 7, 8, 9}, P(A) = 0.097 + 0.079 + 
0.067 + 0.058 + 0.051 + 0.046 = 0.398. (b) B = {2, 4, 6, 8}, 
P(B) = 0.176 + 0.097 + 0.067 + 0.051 = 0.391. (c) Aor B = 
{2, 4, 5, 6, 7, 8, 9}, P(A or B) = 0.176 + 0.097 + 0.079 + 
0.067 + 0.058 + 0.051 + 0.046 = 0.574. This is different 
from P(A) + P(B) because A and B are not disjoint. 


12.15: (a) P(Y $0.6) = 0.6. (b) P(Y < 0.6) = 0.6. (c) P(0.4 = 
Y <= 0.8) = 0.4. (d) P(0.4 < Y < 0.8) = 0.4. The only 
difference between parts (c) and (d) is the inclusion of the 
point Y = 0.4. This has 0 probability for a continuous variable. 


12.17: (a) P(X = 35) (b) P(X = 35) = PZ > 234) = 
P(Z = 1.49) = 1 — 0.9319 = 0.0681 (using Table A) 


12.19: (a) Y = 8 means the student runs the mile in 8 min- 
utes or more. P(Y = 8) = P(Z = *>4"4) = 1 — 0.8849 = 
0.1151 (using Table A). (b) “The student could run mile 
in less than 6 minutes” is the event Y < _ 6. 
P(Y < 6) = P(Z < S4554) = 0.0668 (using Table A). 


12.21: (a) If Joe says P(Syracuse wins) = 0.1, then he 
believes P(Duke wins) = 0.3 and P(North Carolina wins) = 
0.2. (b) Joe’s probabilities for Duke, Syracuse, and North 
Carolina add up to 0.6, so that leaves probability 0.4 for all 
other teams. 


12.23: (b) The set {0, 1, 2, 3, 4, 5} lists all possible counts. 
12.25: (b) The other probabilities add to 0.97. 


12.27: (b) P(not Republican) = 1 — P(Republican) = 1 — 
0.25 = 0.75. 


12.29: (c) “7 or greater” means 7, 8, or 9—three of the ten 
possibilities. 


12.31: (c) Y> 1 standardizes to Z > 2.56. Table A gives 0.0052. 


12.33: (a) legitimate (even though it is not a “fair” die) 
(b) legitimate (even if the deck of cards is not!) (c) not 
legitimate 


12.35: We have dropped the trailing zeroes from the land 
area figures. (a) P(area is forested) = 4176/9094 = 0.4592 
(b) P(area is not forested) = 1 — 0.4592 = 0.5408 


12.37: (a) The given probabilities add to 0.97, so other 
colors must account for the remaining 0.03. (b) P(white or sil- 
ver) = 0.25 + 0.18 = 0.43, P(neither white nor silver) = 1 — 
0.43 = 0.57 


12.39: The probabilities of 2, 3, 4, and 5 are unchanged 
(1/6), so P(1 or 6) must still be 1/3. If P(6) = 0.2, then P(1) = 
1/3 — 0.2 = 2/15. 


eee Le] EL OE Gl Ee Bg 


Probability 013 1/6 1/6 1/6 1/6 02 


12.41: (a) It is legitimate because every person must fall into 
exactly one category, the probabilities are all between 0 and 1, 
and they add up to 1. (b) 0.169 = 0.002 + 0.008 + 0.153 + 
0.006 (c) 0.355 = 1 — 0.645 


12.43: (a) A corresponds to the outcomes in the first col- 
umn or the third row. (b) P(A) = 0.550. This is different from 
the sum of the probabilities in parts (c) and (d) of Exercise 
12.42 because that sum counts the overlap (0.168) twice. 


12.45: (a) X is discrete, because it has a finite sample space. 
(b) “At least one nonword error” is the event {X = 1} or 
{X>0}. PX = 1) =1 -— P(X =0) = 0.9. © {XS 2} is “no 
more than two nonword errors,” or “fewer than three non- 


word errors.” P(X S 2) = P(X = 0) + P(X = 1) + P(X =2) = 
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0.1 + 0.2 + 0.3 = 0.6. P(X < 2) = P(X = 0) + P(X = 1) = 
0.1 + 0.2 = 0.3. 


12.47: (a) Just using initials: {(A, D), (A, M), (A, S), (A, 
R), (D, M), (D, S), (D, R), (M, S), (M, R), (S, R)}. (b) Each 
has probability 1/10 = 10%. (c) Mei-Ling is chosen in 4 of the 
10 possible outcomes: 4/10 = 40%. (d) There are 3 pairs with 
neither Sam nor Roberto, P = 3/10. 


12.49: The possible values of Y are 1, 2, 3,..., 12, each 
with probability 1/12. Aside from drawing a diagram showing 
all the possible combinations, one can reason that the first 
(regular) die is equally likely to show any number from 1 
through 6. Half of the time, the second roll shows 0, and the 
other half it shows 6. Each possible outcome therefore has 
probability (1/6) (1/2) = 1/12. 


12.51: (a) This is a continuous random variable because the set 
of possible values is an interval. (b) The height should be 1/2 because 
the area under the curve must be 1. (For a rectangle, area = 
L X W) The density curve is illustrated. (c) P(Y S 1) = 1/2 


0 1 2) 
Height 
12.53: (a) P(0.49 = V = 0.53) = POS eZ =Z2 
053 — O51) = P(-2.22 = Z S 2.22) = 0.9868 — 0.0132 = 
0.9736 (b) P(V = 0.49) = P(Z = #5) = PZ = 
722) 00132, 


12.55: (a) Because there are 10,000 equally likely four-digit 
numbers (0000 through 9999), the probability of an exact 
match is 1/10,000 = 0.0001. (b) There are a total of 24 = 4 X 
3 X 2 X 1 arrangements of the four digits 5, 9, 7, and 4 (there 
are four choices for the first digit, three for the second, two for 
the third), so the probability of a match in any order is 
24/10,000 = 0.0024. 


12.57: (a)—(c) Results will vary, but after n tosses, the distri- 
bution of the proportion (call it p) is approximately Normal 
with mean 0.5 and standard deviation 1/(2Vn), while the 
distribution of the count of heads is approximately Normal 
with mean 0.5n and standard deviation Vn/2, so using 
the 68—95-99.7 rule, we have the results shown in the table. 
Note that the range for the proportion p gets narrower, while 
the range for the count gets wider. 


n 99.7% RANGE FOR p 99.7% RANGE FOR COUNT 
40 0.5 + 0.237 20 + 9.5 
120 0.5 + 0.137 60 + 16.4 
240 0.5 + 0.097 120 + 23.2 
480 0.5 + 0.068 240 + 32.9 
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12.59: (a) With n = 50, the variability in the proportion 
(call it 6) is larger. With n = 100, nearly all answers will be 
between 0.19 and 0.47. With n = 400, nearly all answers will 
be between 0.26 and 0.40. (b) Results will vary. 


Chapter 13 General Rules of Probability 


13.1: It is unlikely that these events are independent. In par- 
ticular, it is reasonable to expect that younger adults are more 
likely than older adults to be college students. 


13.3: If we assume that each site is independent of the oth- 
ers, P(all seven are still good) = (1 — 0.13)’ = 0.3773. 


13.5: A Venn diagram is provided. (a) P(degree was 
earned by a man) = P(not W) = 1 — 0.59 = 0.41, or 41%. 
(b) P(B and not W) = 0.50 — 0.29 = 0.21, or 21%. (c) 
Because P(B and M) = 0.21, but P(B) X P(M) = (0.50) 
(0.41) = 0.205, P(B and M) # P(B) X P(M). So, B and M 
are not independent. 


Note: While the multiplication here results in a “close” result to P(B 
and M), in the probabilistic sense, these must be exactly equal to 
have independence. 


Band not W W and not B 
0.21 0.30 


Neither B nor W 
0.20 


13.7: P(W|B) = Mas = 055 = 0.58 


13.9: Let H be the event that an adult belongs to a club, 
and T be that he/she goes at least twice a week. Note that P(T 
and H) = P(T), because one has to be a member to attend. 
P(T) = P(H)P(T | H) = (0.10) (0.40) = 0.04. 


13.11: (a) and (b) These probabilities are provided follow- 
ing. (c) The product of these conditional probabilities gives 
the probability of a flush in spades. The product is about 
0.0004952. (d) There are four possible suits in which to have a 
flush, so P(flush) is four times that found in part (c), or about 
0.001981. 


P(1st card is @) 5 = 0.25 

P(2nd card is # | first card is #) # = 0.2353 

P(3rd card is @ | first two are #) i= 0.22 

P(4th card is @ | first three are @) iy = 0.2041 
( 


P(5th card is @ | first four are 4) x = 0.1875 


13.13: (a) The tree diagram is shown. (b) Two branches 
result in a positive test result. P(Positive test) = (0.063) (0.21) + 
(0.937) (0.06) = 0.01323 + 0.05622 = 0.06945. 


oe 


Cancer e 
0.063 a Test— 
e 
- oe Test + 
No cancer e 
0) 


06 
sk Test — 


no cancer and positive test) __ 


si ( 
13.15: (a) P(nocancer | positive test) = ° Pipealiive tat) 
oe = aoe 0.8095. The 0.06945 in the denominator 


was found in Exercise 13.13. (b) About 81% of positive test 
results come from people without prostate cancer. Most of 
those being treated do not have the disease, and might suffer 
those serious side effects. 


13.17: (b) P = (1 — 0.02)’ = (0.98)* = 0.9412 


13.19: (a) Because the tests are independent, P(at least one 
positive) = 1 — P(both negative) = 1 — (0.1) (0.2) = 0.98. 


13.21: (c) Of 4798 education doctorates, 1501 were awarded 
to males. 1501/4798 = 0.3128. 


13.23: (c) Outcome A (engineering doctorate) is what has 
been given (signaled by the phrase “of engineering doctorates”). 


13.25: (c) P(Wand D) = P(W)P(D | W) = (0.86) (0.028) = 
0.024 


13.27: P(8 losses) = (1 — 0.25)§ = 0.1001 


13.29: (a) P(win the jackpot) = (35)()(z5) = 0.001125. (b) 
The other (non-cherry) symbol can show up on the middle 
wheel, with probability (45)(3$)(z) = 0.001375, or on either 
of the outside wheels, with probability = (38)(35)(z5) (each). 
(c) Combining all three cases from part (b), we have P (exactly 
two cherries) = 0.001375 + 2(0.021375) = 0.044125. 


13.31: PLAN: Let I be the event “infection occurs” and let F 
be “the repair fails.” We have been given P(I) = 0.03, P(F) = 
0.14, and P(I and F) = 0.01. We want P(not I and not F). 
SOLVE: First, P(I or F) = P(I) + P(F) — P(Land F) = 0.03 + 
0.14 — 0.01 = 0.16. Observe that the desired probability is the 
complement of “I or E” P(not I and not F) = 1 — P(lor F) = 
0.84. CONCLUDE: 84% of operations succeed and are free 


from infection. 


Neither | nor F 


Repair fails 


13.33: Let! be the event “infection occurs” and let F be “the 
repair fails.” Refer to the Venn diagram in Exercise 13.31. 
P(I| not F) = Pognenn®) = oR = 0.0233 


13.35: (a) P(positive | disease) = 564/574 = 0.9826 
(b) P(no disease | negative) = 708/718 = 0.9861 


13.37: (a) These events are not independent; P(pizza with 
mushrooms) = 4/7, but P (mushrooms | thick crust) = 2/3. 
(b) With the eighth pizza, P (mushrooms) = 4/8 = 1/2, and 
P (mushrooms | thick crust) = 2/4 = 1/2, so these events are 
independent. 


13.39: Let Wbe the event “the person is a woman” and M be 
“the person earned a Master’s degree.” (a) P(not W) = 
1569/3825 = 0.4102 (b) P(not W | M) = 322/816 = 0.3946 
(c) The events “choose a man” and “choose a Master’s degree 
recipient” are not independent. If they were, the probabilities 
in parts (a) and (b) would be equal. 


13.41: Let D be the event “a seedling was damaged by a 
deer.” (a) P(D) = 209/871 = 0.2400 (b) The conditional prob- 
abilities are 


P(D|no cover) = 60/211 = 0.2844 

P(D | cover < 1/3) = 76/234 = 0.3248 
P(D | 1/3 to 2/3 cover) = 44/221 = 0.1991 
P(D | cover > 2/3) = 29/205 = 0.1415 


(c) Cover and damage are not independent; P(D) decreases 
noticeably when thorny cover is 1/3 or more. 


13.43: Having no thorny cover means there is less than 1/3 
thorny cover. This conditional probability is P(cover < 1/3 | 
D) = (60 + 76)/(60 + 76 + 44 + 29) = 136/209 = 0.6507, 
or 65.07%. 


13.45: This is P(A and (not B) and (not C)) = 0.35. 


13.47: (a) P(doubles on first toss) = 1/6 (b) We need no 
doubles on the first roll, then doubles on the second toss. 
P(first doubles appears on toss 2) = (5/6) (1/6) = 5/36 (c) Sim- 
ilarly, P(first doubles appears on toss 3) = (5/6)?(1/6) = 25/216 
(d) P(first doubles appears on toss 4) = (5/6)3(1/6), ete. 
In general, P(first doubles appears on toss k) = (5/6)k'(1/6). 
(e) P(go again within 3 turns) = P(roll doubles in 3 or fewer 


rolls) = (1/6) + (5/6) (1/6) + (5/6)?(1/6) = 0.4213. 


13.49: PLAN: We construct a tree diagram showing the 
results (allergic or not) for each of the three individuals. 
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SOLVE: In the tree diagram, each “up-step” represents an aller- 
gic individual. At the end of each of the eight complete branches 
is the value of X. Any branch with two up-steps and one down- 
step has probability 0.02? 0.98! = 0.000392, and yields X = 
2. Any branch with one up-step and two down- steps has prob- 
ability 0.02! x 0.98? = 0.019208, and yields X = 1. 


X=3 
Allergic ee ae 
ee SSS 
Allergic e 
= Nae ee as 
allergic a 
X=1 
e 
X=2 
Allergic ae 
ee ae 
Not . 
allergic oo 
Not Be a 
allergic ° 
es 
There are three branches each corresponding to X = 2 


and X = 1, and only one branch each for X = 3 and X = 0. 
P(X = 0) = 0.98? = 0.941192 and P(X = 3) = 0.023 = 
0.000008. Meanwhile, P(X = 1) = 3(0.02)!(0.98)? = 0.057624, 
and P(X = 2) = 3(0.02)7(0.98)! = 0.001176. CONCLUDE: 
P(X = 0) = 0.941192, P(X = 1) = 0.057624, P(X = 2) = 
0.001176, and P(X = 3) = 0.000008. 

13.51: P(X = 2|X = 1) = 5 eee = eet = 


0.001176 __ 
1 — 0.941192 ~~ 0.020. 


13.53: Let R = {recent donor}, P = {pledged}, and C = 
{contributed}. (a) P(C) = (0.5) (0.4) (0.8) + (0.3) (0.3) (0.6) + 
(0.2) (0.1) (0.5) = 0.224, or 22.4% (b) P(R|C) = My = 
(O58) — 0.7143, or 71.4% 


13.55: The proportion having combination (16, 17) is 
2(0.232) (0.213) = 0.098832. 


13.57: Ifthe DNA profile found on the hair is possessed by 1 
in 1.6 million individuals, then we would expect about 3 indi- 
viduals in the database of 4.5 million convicted felons to dem- 
onstrate a match. (4.5 million)/(1.6 million) = 2.8125, which 
was rounded up to 3. 


Chapter 14 Binomial Distributions 


14.1: Binomial. (1) We have a fixed number of observations 
(n = 20). (2) It is reasonable to believe that each call is inde- 
pendent of the others. (3) “Success” means reaching a working 
residential number; “failure” is any other outcome. (4) Each 
randomly dialed number has chance p = 0.3 of reaching a live 
person. 
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14.3: Not binomial. The trials aren’t independent. If one tile 
in a box is cracked, there are likely more tiles cracked. 


14.5: (a) C, the number caught, is binomial with n = 10 and 
p = 0.7. M, the number missed, is binomial with n = 10 and 
p = 0.3. (b) P(M = 3) = (7)(0.3)0.7)' = (120)(0.027) 
(0.08235) = 0.2668. With software, P(M = 3) = 0.6172. Use 
the fact that “3 or more” is the complement of “2 or fewer” with 
most software. 


14.7: (a)5 choose 2 returns 10. (b) 500 choose 2 returns 
124,750, and 500 choose 100 returns 2.041694 x 10! 
(c) (10 choose 1) *0.11*0.89%9 returns 0.38539204407. 


14.9: (a) X is binomial with n = 10 and p = 0.3; Y is 
binomial with n = 10 and p = 0.7. (b) The mean of Y is 
(10) (0.7) = 7 errors caught, and for X the mean is (10) (0.3) = 
3 errors missed. (c) The standard deviation of Y (or X) is 


o = V10(0.7)(0.3) = 1.4491 errors. 


14.14: (a) w = (1175)(0.37) = 434.75 and o = 
V1175(0.37)(1 — 0.37) = V273.8925 = 16.550 _ students. 
(b) np = (1175)(0.37) = 434.75 = 10 and n(l1—p)= 


(1175)(0.63) = 740.25 = 10, so n is large enough for the Nor- 
mal approximation. The college wants 450 students, so 
P(X = 451) = P(Z = PLB") = P(Z = 0.98) = 0.1635. 
(c) The exact binomial probability is 0.1706; the Normal approxi- 
mation is 0.0071 too low. (d) To decrease the chance of more 
students than they want, they need to decrease the number 
admitted. Ifn = 1145, we have P(more than 450) = 0.047; with 
n = 1146, we have P = 0.0494. 


14.13: (b) He has three independent eggs, each with proba- 
bility 1/4 of containing salmonella. 


14.15: (c) The selections are not independent; once we 
choose one student, it changes the probability that the next 
student is a business major. 


14.17: (b) (0.25) (0.75)*(0.25) = 0.0264 
14.19: (b) 8 or 9 are two of the ten possible digits, so 0.20. 


14.24: (a) u = np = (80)(0.20) = 16 


14.23: (a) A binomial distribution is not appropriate, because 
the kicker faces different situations each time (wind, distance, 
etc.); his probability of success is likely to change from one 
attempt to another. (b) It would be reasonable to use a binomial 
distribution for free throws made because each is from the same 
position with respect to the basket with no interference allowed 
for the shot, and presumably each is independent of any others. 


14.25: (a) n =5 and p = 0.65. (b) S = {0, 1, 2, 3, 4, 5} 
(c) All cases computed. 

P(X = 0) = (2)(0.65)%(0.35)> = 0.00525 

P(X = 1) = (7)(0.65)(0.35)* = 0.04877 

P(X = 2) = (3)(0.65)(0.35)> = 0.18115 


P(X = 3) = (3)(0.65)(0.35)? = 0.33642 
P(X = 4) = (7)(0.65)*(0.35)! = 0.31239 
P(X = 5) = (2)(0.65)°(0.35)° = 0.11603 
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(d) w = np = (5)(0.65) = 3.25 and o@ = V5(0.65)(1 — 0.65) 
= 1.0665 years. The mean y is indicated on the probability 
histogram. 


14.27: (a) One woman getting pregnant will not affect 
whether another gets pregnant (independent trials), and (we 
assume) each has the same probability of getting pregnant. 
(b) Under ideal conditions, the number who get pregnant is 
binomial with n = 20 and p = 0.01; P(N = 1) = 1 — P(N = 
0) = 1 — 0.8179 = 0.1821. In typical use, p = 0.03, and 
P(N = 1) = 1 — 0.5438 = 0.4562. 


14.29: (a) X is binomial with n = 600 and p = 0.03. The 
Normal approximation is safe: np = 18 andn(1 — p) = 582 are 
both at least 10. The mean is 18, and o = 4.1785, so P(X = 
20) =P(Z = “Sa7e3*) = P(Z = 0.48) = 1 — 0.6844 = 0.3156. 
The exact binomial probability is 0.3477. (b) Under ideal con- 
ditions, p = 0.01, so np = 6 is too small. 


14.31: (a) If R is the number of red-blossomed plants, then 
PR = 3) = ree = 0.4219. (b) With n = 60, 
p= np = (60) (0.75) = 45. (c) P(R = 45) = P(Z = 0) = 0.5000 


(software gives 0.5688 as the exact binomial probability). 


14.33: (a) Of 720,783 total vehicles, Elantras accounted for 
a proportion of 247,912/720,783 = 0.3439. (b) If E is the num- 
ber of Elantra buyers, E has the binomial distribution with n = 
1000, and p = 0.3439. w = np = (1000) (0.3439) = 343.9 and 
o = Vnp(1 — p) = V1000(0.3439)(T — 0.3439) = 15.021 
Elantra buyers. (c) P(E < 400) = P(E S$ 399) = PZ Ss 
—3.67) = 0.0001 


14.35: (a) Withn = 100, w = 75 anda = 4.3301 questions, 
so P(70 = X = 80) = P(-1.15 SZ S 1.15) = 0.7498 (soft- 
ware gives 0.7518). (b) With n = 250, w = 187.5 and a = 
6.8465 questions, and a score between 70% and 80% 


means 175 to 200 correct answers, so P(175 = X S 200) = 
P(—1.83 S Z S 1.83) = 0.9328 (software gives 0.9428). If 
one used the more mathematical idea that P(70 < X < 80) = 
P(71 = X S 79) = 0.6424 for the 100 question test and 
P(176 < X < 199) = 0.9328 for the 250 question test. 


14.37: (a) Answers will vary, but over 99.8% of samples 
should have 0 to 4 bad tomatoes. (b) Each time we choose a 
sample of size 10, the probability that we have exactly 1 bad 
tomato is 0.3854; out of 20 samples, the number of times that 
we have exactly 1 bad tomato has a binomial distribution with 
parameters n = 20 and p = 0.3854. Most students will find 
that between 2 and 14 of their 20 samples have exactly 1 bad 
tomato. 


14.39: The number N of infections among untreated BJU 
students is binomial with n = 1400 and p = 0.80, so w = 1120 
and 0 = 14.9666 students. 75% of that group is 1050; the 
Normal approximation is safe because (1400)(0.80) = 1120 
and (1400) (0.20) = 280 are both at least 10. P(N = 1050) = 
PZ= Oe ae) = P(Z = —4.68), which is very near to 1. 
(Exact binomial computation gives 0.999998.) 


14.41: Let V and U be (respectively) the number of new 
infections among the vaccinated and unvaccinated children. 
(a) P(V = 1) = 0.3741 and P(U = 1) = 0.0960. Because these 
events are (assumed) independent, P(V = 1 and U = 1) = 
(0.3741) (0.0960) = 0.0359. (b) P(2 infections) = P(V = 0 
and U = 2) + P(V = land U = 1) + PV = 2 and U = 0) = 
P(V = 0)P(U = 2) + P(V = 1)P(U = 1) + PV = 2)P(U = 0) 
= (0.4181) (0.3840) + (0.3741) (0.0960) + (0.1575) (0.0080) = 
0.1977 


14.43: The number X of fairways hit is binomial with n = 30 
and p = 0.63. (a) nb = 18.9 and n(1 — p) = 11.1, so the 
Normal approximation is safe. (b) w = np = 18.9, and 
a = V30(0.63)(0.37) = 2.6444. Using the Normal approxi- 
mation, P(X = 23) = P(Z = 1.55) = 0.0606. (c) With the 
continuity correction, P(X = 23) = P(X = 22.5) = P(Z = 
1.36) = 0.0869 (using Table A). The answer using the conti- 
nuity correction is closer to the exact answer (0.0838). 


Chapter 15 Sampling Distributions 


15.1: Both are statistics; they came from the 11 subjects in 
the experiment. 


15.3: 98.8% and 22 are statistics; they are based on the sur- 
vey of 500 American anabolic androgenic steroid users. 50.0% 
is a parameter. 


15.5: Although the probability of having to pay for a total 
loss for is very small, if this were to happen, it would be finan- 
cially disastrous. For thousands of policies, the law of large 
numbers says that the average claim on many policies will be 
close to the mean, so the insurance company can be assured 
that the premiums collected will (almost certainly) cover the 
claims. 
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15.7: (a) Histogram is shown. (b) The mean is pw = 69.4 (c) 
and (d) Results will vary. The results of one sample selected 
students 5, 2, 9, and 4 with scores of 72, 63, 75, and 55. Their 
mean is xX = (72 + 63+ 75 + 55)/4 = 66.25. Students should 


repeat this process until they have 10 sample means. 


<\(v) Score 


Count 


15.9: (a) The sampling distribution of x is N(115, 
25/V/100) = N(115 mg/dL, 2.5 mg/dL). P(112 < x < 118) = 
P(-1.2 <Z < 1.2) = 0.8849 — 0.1151 = 0.7698, using Table 
A. (b) With n = 1000, the sample mean has the N(115 mg/dL, 
0.7906 mg/dL) distribution, so P(112 <x < 118) = P(—3.79 < 
Z < 3.79) = 0.9998. 


15.11: No: the histogram of the sample values will look like 
the population distribution, whatever it might happen to be. The 
central limit theorem says that the histogram of sample means 
(from many large samples) will look more and more Normal. 


15.13: PLAN: Use the central limit theorem to approxi- 
mate the probability that the mean loss for 10,000 insurance 
policies is no more than $135. SOLVE: In spite of the skew- 
ness of the population distribution, the average loss among 
10,000 policies will be approximately N($125, $300/ 
10,000) = N($125, $3). P(x = $135) = PZ = '@5)= 
P(Z S 3.33) = 0.9996. CONCLUDE: We can be about 
99.96% certain that average losses will not exceed $135 per 
policy. 


15.15: Answers will vary due to randomness. Students 
should find that a sample mean of 76 will almost never occur; 
this indicates that the mean for the four honors students is 
unusually high. 


15.17: (b) statistic; this is a proportion of the people inter- 
viewed in the sample of 60,000 households 


15.19: (b) The law of large numbers says that the mean from 
a large sample is close to the population mean. Statement (c) 
is also true, but is based on the central limit theorem, not on 
the law of large numbers. 


15.21: (c) The standard deviation of the distribution of xX is 
a/Vn. 


15.23: (c) The central limit theorem says that the mean 
from a large sample has (approximately) a Normal distribution. 
Statement (a) is also true, but is based on the law of large num- 
bers, not on the central limit theorem. 
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15.25: 1 isa parameter; 1.07 is a statistic (the mean of the 10 
measurements in the sample). 


15.27: In the long run, the gambler earns an average of 94.7 
cents per bet. In other words, the gambler loses (and the house 
gains) an average of 5.3 cents for each $1 bet. 


15.29: Let X be Shelia’s measured glucose level. (a) an > 
130) = P(Z > 0.67) = 0.2514. (b) x has a N(122, 12/V 4) = 
N(122 mg/dL, 6 mg/dL) distribution; P(x > 130) = P(Z > 
1.33) = 0.0918. 


15.31: The mean of four measurements has a N(122 mg/dL, 
6 mg/dL) distribution, and P(Z > 1.645) = 0.05 if Z is N(0, 1), 
so L = 122 + 1.645 X 6 = 131.87 mg/dL. 


15.33: (a) x will have an approximately Normal distribution 
with mean 8.8 beats per five seconds, and standard deviation 
1/24 = 0.288675 beats per five seconds. (b) P(x < 8) = 
P(Z < —3.92) = essentially 0. (c) The average over 12 five- 
second intervals needs to be less than 100/12 = 8.333 beats 
per five seconds. P(x < 8.333) = P(Z < —1.62) = 0.0526. 


15.35: PLAN: Use the central limit theorem to approximate 
this probability. SOLVE: Over 40 years, x (the mean return) is 
approximately Normal with 4 = 13.3% and o = 17.0%/V 40 = 
2.688%. P(x > 10%) = P(Z > —1.23) = 0.8907, and P(x <5%) = 
P(Z < —3.09) = 0.0010. CONCLUDE: There is about an 89% 
chance of getting average returns over 10%, and a 0.1% chance 
of getting average returns less than 5%. 


15.37: Choose n so that 6.5/Vn =1. That means 
Vn = 6.5, son = 42.25. Because n must be a whole number, 
taken = 43. 


15.39: On the average, Joe loses 40 cents each time he plays 
(that is, he spends $1 and gets back 60 cents). 


15.41: (a) With n = 150,000, wz = $0.40 and o = 
$18.96/V150,000 = $0.0490. (b) P($0.30 < x < $0.50) = 
P(—2.04 < Z < 2.04) = 0.9586 


15.43: The mean is 10.5 = (3)(3.5) because a single die 
has a mean of 3.5. Sketches will vary, as will the number of 
rolls. 


Chapter 16 Confidence Intervals: The Basics 


16.1: (a) The sampling distribution of x has mean p and 
standard deviation %% = Vio = = 0.3031. (b) 95% of all 
values of x fall within two standard deviations of the sampling 
distribution of p—that is, within 2(0.3031) = 0.6062. 
(c) 285 + 0.6062, or between 284.3938 and 285.6062. 


16.3: Answers will vary due to randomness. In 99.7% of all 
repetitions of part (a), students should see between 5 and 10 
hits. Out of 1000 80% confidence intervals, nearly all stu- 
dents will observe between 76% and 84% capturing the 
mean. 


16.5: Search Table A for 0.1250. This area corresponds to 
—z = -1.15, orz” = 1.15. 


16.7: (a) A stemplot shows two low scores (72 and 74) 
that are both possible outliers, but there are no other appar- 
ent deviations from Normality. (b) PLAN: We estimate ps by 
giving a 99% confidence interval. SOLVE: The problem 
states that these girls are an SRS of the population, which is 
very large. We saw that the scores are consistent with having 
come from a Normal population, so conditions are met. 
With x = 105.84, and z* = 2.576, our 99% confidence inter- 
val for w is given by 105.84 + 2.516 ay = 105.84 + 6.94. 
CONCLUDE: We are 99% confident that the mean IQ of 
seventh-erade girls in this district is between 98.90 and 
112.78 points. 


16.9: With z* = 1.96 and o = 7.5, me. = zx t= (a) 
and (b) The margins of error are given in the table. (c) Margin 
of error decreases as n increases. 


n m.e. 
100 1.47 

400 0.735 

1600 0.3675 
16.11: (c) xy = 3.291. Using Table A, search for 0.9995. 
16.13: (b) As the confidence level increases, z* increases. 
16.15: (b) o(x) is So = View = 4.167. 
16.17: (b) As the confidence level increases, z* increases. 


This makes the margin of error larger. 


16.19: (a) We use x + z*% = 15.3 + 2.576535 =153 4 
1.018 = 14.282 to 16.318 hours. (b) The 463 students in this 
class must be a random sample of all of the first-year students 
at this university to satisfy conditions for inference. 


16.21: The margin of error is 2.576Seq = 1.02, so the extra 
observation has minimal impact on the margin of error. If 
x = 36.8, then the 99% confidence interval for average 
amount of time spent studying becomes 36.8 + 1.02 = 35.78 
to 37.82 hours. 


16.23: This student is confused. If we repeated the sample 
over and over, 95% of all future sample means would be within 
1.96 standard deviations (that is, within 1.96:7;) of the true, 
unknown value of wz. Future samples will have no memory of 
this sample. 


16.25: (a) A stemplot of the data is provided. Notice that 
the distribution is noticeably skewed to the left. The data do 
not appear to follow a Normal distribution. (b) PLAN: We 
will estimate ps by giving a 95% confidence interval. SOLVE: 
The problem states that we are willing to take this sample to 
be an SRS of the population. In spite of the shape of the 
stemplot, we are told to assume that this distribution is Nor- 
mal with standard deviation o = 3000 lb. x = 30,841 lb, so 


the 95% confidence interval for mw is given by 30,841 + 
1.962583. CONCLUDE: With 95% confidence, the mean load 
pb required to break apart pieces of Douglas fir is between 
29,526.2 and 32,155.8 pounds; however, given the shape of 
the distribution of the data, we cannot rely much on this 
interval. 


Stem and Leaf 


Stem | Leaf Count 
33 | 0237 4 
32 |033677 | 6 
31/399 3 
30 | 259 3 
29 
28 | 7 1 
27 
26 | 5 1 
25 
24 | 1 1 
23 | O 1 


23 | O represents 23000 


16.27: (a) A stemplot is given. There is little evidence 
that the sample does not come from a Normal distribution. 
For inference, we must assume that the 10 untrained stu- 
dents were selected randomly from the population of all 
untrained people. (b) PLAN: We will estimate w with a 95% 
confidence interval. SOLVE: We have assumed that we have 
a random sample, and that the population we’re sampling 
from is Normal. We obtain x = 29.4 wg/L. Our 95% confi- 
dence interval for p is given by 29.4 + 1.96<45 = 25.06 to 
33.74 pg/L. 


Stem-and-Leaf DMS N = 10 


Leaf Unit = 1 
1| 9 
2)235 
2/9 
3 | 0013 
3 | 5 
4/2 


CONCLUDE: With 95% confidence, the mean sensitivity for 
all untrained people is between 25.06 and 33.74 we/L. 


Chapter 17 Tests of Significance: The Basics 


17.1: (a) If w = 50, the sampling distribution is approxi- 
mately Normal with mw =50 and_ standard deviation 
Sn = dhs = 1. (b) The actual result lies toward the low tail of 
the curve, while 48.9 is fairly close to the middle. If 4 = 50, 
observing a value similar to 48.9 would not be too surprising, 
but 47.4 is much less likely, and therefore provides some evi- 
dence that w < 50. 
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SSHA Mean for nh = 225 Philippine Seniors 
Normal, Mean = 50, StDev = 1 


Density 


Mean 


17.3: Ho: w= 50 vs. Hy pw < 50. The teacher suspects that 
poor attitudes are in part responsible for the decline in scores. 


17.5: Hy: w = 75 vs. H,: w < 75. The professor suspects 
this section’s students perform worse than the population of 
all students in the class on average. 


17.7: Hypotheses are statements about parameters, not sta- 
tistics. The research question is not about the sample mean (x), 
but should be about the population mean, p. 


17.9: (a) With o = 60 andn = 18, the standard deviation is 
a= ae = 14.1421, so the distribution of x is N(O, 14.1421). 
(b) P = 2P(x = 17) = 2P(Z = thd) = 0.2302 


17.11: (a) The P-value for x = 48.7 is 0.1357. This is not 
significant at either a = 0.05 or a = 0.01. (b) The P-value for 
x = 47.4 is 0.0047. This is significant at both a = 0.05 and 
a = 0.01. (c) If w = 50 (if Hp were true), observing a value 
similar to 48.7 would not be too surprising, but 47.4 is much less 
likely, and it therefore provides strong evidence that wu < 50. 


17.13: (a) z= vie = “3ST = 0.9488 (b) x = Wit, = 
“03162 = 3.226 (c) z 60/ VI8 14.1421 1.2021. Note that 
in part (c) the test is two-sided, while in parts (a) and (b), it is 


one-sided. 


17.15: PLAN: Let wu be the average percentage tip for all 
customers receiving bad news. We test Hy: w = 20 against H,: 
pe < 20. SOLVE: We have a random sample of n = 20 custom- 
ers and were told to assume tips have a Normal distribution. 
X = 18.19%. o(X) = vag = 0.4472, s0z = Si”? = —4.05. 
The P-value is P(Z = —4.05) ~ 0. CONCLUDE: There is 
overwhelming evidence that the average tip percentage when 
bad news is delivered is lower than the average tip percentage 
overall (20%). 


17.17: x = 1.876 is not significant at the a = 0.05 level 
because x is not larger than 1.96 or less than —1.96. It is also 
not significant at a = 0.01 because |z| < 2.576. 


17.19: It appears that six samples had a median of 33, three 
had a median of 34, and we were told that eight had a median 
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of 35 or larger. That means the estimated P-value is 17/1000 = 
0.017. If the median had been 30, we’d estimate a P-value of 
about 0.109 (about 50 samples had a median of 30, 30 had a 
median of 31, and 12 had a median of 32). 


17.21: (a) This is the definition of a P-value. 


17.23: (c) The P-value for zy = 2.41 is 0.0080 (assuming that 
the difference is in the correct direction). 


17.25: (a) The null hypothesis states that ys takes on the 
“default” value, 18 seconds. 


17.27: (c) A small P-value means we should not (or should 
rarely) find an observed difference as large as or larger than 
what was seen in Hp is true. The P-value does not tell us 
whether the difference seen is “large” or “practically impor- 
tant,” nor does it refer to the probability Hp is true. 


17.29: (a) This is a one-sided alternative, so we have 
0.005 in the right tail of the Normal distribution, leading to 
zx > 2.576. 


17.34: (a) Ho: w= Ovs. H,: w > 0. (b) x = Ferme = 13.29 
(c) The P-value is essentially 0. Under Ho, it would be virtually 
impossible to observe a sample mean as large as 2.35 based on 
a sample of 200 men. This sample mean cannot explained by 
random chance, and we would easily reject Ho. 


17.33: “P< 0.05” does mean that Hp is not likely to be correct 
... but only in the sense that it provides a poor explanation of 
the data observed. It means that if Hp is true, a sample as con- 
trary to Hy as ours would occur by chance alone less than 5% of 
the time if the experiment was repeated over and over. It does 
not mean that there is a less than 5% chance that Hp is true. 


17.35: The person is confusing practical significance with 
statistical significance. In fact, a 5% increase isn’t a lot in a 
pragmatic sense. However, P = 0.03 means that random 
chance does not easily explain the difference observed. 


17.37: In the sketch, the “significant at 1%” region includes 
only the dark shading (zy > 2.326). The “significant at 5%” 
region of the sketch includes both the light and dark shading 
(zy > 1.645). If P < 0.01, we must have P < 0.05. The converse 
is false: something that occurs “less than 5 times in 100 repeti- 
tions” is not necessarily as rare as something that happens “less 
than once in 100 repetitions,” so a test that is significant at the 
5% level is not necessarily significant at the 1% level. 


17.39: Because a P-value is a probability, it can never be 
greater than 1. The correct P-value is P(Z = 1.33) = 0.0918. 


17.41: PLAN: We will test the hypotheses Ho: up = 0% against 
H,: jb < 0%. SOLVE: x = —3.587%, x = Jsrvqr’ = —9.84, 
and P = P(Z = —9.84) ~ 0. CONCLUDE: There is over- 
whelming evidence that, on average, nursing mothers lose 
bone mineral. 


17.43: (a) We test Ho: w = 0 vs. H,: ww > 0, where pw is the 
mean sensitivity difference in the population. (b) PLAN: We 
test the hypotheses stated in part (a). SOLVE: x = 0.10125, 
z= Oe. = 1.84 and P = 0.0329. CONCLUDE: The 
sample gives significant evidence (at a = 0.05) that eye grease 
increases sensitivity. 


17.45: (a) No, because 33 falls in the 95% confidence inter- 
val, which is (27.5, 33.9). (b) Yes, because 34 does not fall in 
the 95% confidence interval. 


Chapter 18 _ Inference in Practice 


18.1: The most important reason is (c); this is a convenience 
sample consisting of the first 20 students ona list. This is not an 
SRS. The other two reasons are valid, but less important issues. 
Reason (a)—the size of the sample and large margin of error— 
would make the interval less informative, even if the sample 
were representative of the population. Reason (b)—nonre- 
sponse—is a potential problem with every survey, but there is 
no particular reason to believe it is more likely in this situation. 


18.3: The day after Thanksgiving is widely regarded as a day 
on which retailers offer great deals—the kinds of shoppers 
found that day probably don’t represent shoppers generally. 
Also, the sample isn’t random. 


18.5: You cannot conclude this. The restaurant you work at is 
most likely different in many ways from the one where the experi- 
ment took place. We cannot talk about 95% of individual days; the 
confidence interval is for the average tip in a long sequence of tips. 


18.7: The only source of error included in the margin of error 
is that due to random sampling variability, so (c). 


18.9: (a) and (b) The results and the curve for n = 9 are 
shown. We see that as the sample size increases, the same 
difference between fy and x goes from being not at all signifi- 
cant to highly significant. 


Ho: w=|5 
Sample Mean =4.8 
P-value = 0.1587 
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n P-value 


9 0.1587 
16 0.0912 
36 0.0228 


64 0.0038 


18.11: (a) Each test (subject) has a 5% chance of being 
deemed “significant” at the 5% level when the null hypothesis 
(no ESP) is true. With 1000 tests, we’d expect 50 such occur- 
rences. (b) Retest the nine promising subjects with a different 
version of the test. 


18.13: For a margin of error +10, we need at least 
n = (28)02))2 = 477.82, son = 423. 


18.15: (a) Increase power by taking more measurements. 
(b) If you increase a, you make it easier to reject Hg, and so 
increase power. (c) A value of = 260 is even further from 
the stated value of x = 243 under Ho, so power increases. 


18.17: As o decreases, power increases. More precise mea- 
surements increase the researcher’s ability to recognize a false 
null hypothesis. 


o 55 40 30 
0.290 0.442 0.639 


Power 


18.19: (a) All statistical methods are based on probability 
samples. We must have a random sample in order to apply them. 


18.21: (b) Inference from a voluntary response sample is 
never reasonable. Online web surveys are voluntary response 
surveys. 


18.23: (a) There is no control group. Any observed improve- 
ment may be due to the treatment, or may be due to some 
other cause. 


18.25: (a) The significance level (a) is the probability of 
rejecting Hy when Hy is true. 


18.27: (c) Power describes the test’s ability to reject a false Hp. 


18.29: We need to know that the samples taken from both 
populations are random. Are the samples large? 


18.31: Many people might be reluctant to relate details of 
their sex lives, or perhaps some will be inclined to exaggerate. 
It would not be surprising that such an estimate would be 
biased, but this author cannot guess the direction. 


18.33: The effect is greater if the sample is small. With a 
larger sample, the impact of any one value is small. 


18.35: Opinion—even expert opinion—unsupported by 
data is the weakest type of evidence, so the third description 
is level C. The second description refers to experiments (clin- 
ical trials) and large samples; that is the strongest evidence 


(level A). 


18.37: (a) The P-value decreases. (b) Power increases. 
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18.39: (a) x = 9.524, x = “She* = 1.704, P = 2P(Z = 
1.704) = 0.0884 (using software). This is not significant at the 
5% level of significance. We would not reject 8 as a plausible 
value of w, even though pw = 10. (b) The small sample size 
makes it difficult to detect a difference that is really there. 


18.41: (a) “Statistically insignificant” means that the differ- 
ences observed were no more than might have been expected 
to occur by chance even if SES had no effect on LSAT results. 
(b) If the results are based on a small sample, then even if the 
null hypothesis were not true, the test might not be sensitive 
enough to detect the effect (ie., have low power). Knowing 
the effects were small tells us that the test was not insignificant 
merely because of a small sample size. 


18.43: n = (C2900) = 96,04. Take n = 97. 


18.45: (a) This test has a 20% chance of rejecting Hp when 
the alternative is true. (b) If the test has 20% power, then 
when the alternative is true, it will fail to reject Hp 80% of the 
time. (c) The sample sizes are very small, which typically leads 
to low-power. 


18.47: From the applet, against the alternative pu = 10, 
power = 0.609. 


18.49: (a) Because the alternative is uw # 8, we reject 
Ho at the 5% level when z = 1.96 or zy S —1.96. (b) x = 
Te = 1.118(x- 8). We reject Hp if 1.118(x — 8) = —1.96 
or if 1.118(% — 8) = 1.96. Solving for xX, we reject Hp if 
x = 6.247 or xX = 9.753. (c) When w = 10, the power is 
P(x S 6.247) + P(x = 9.753) = P(Z Ss oe) +P(Z= 
Ts) = P(Z = —4.20) + P(Z = —0.28) = 0.0000 + 
0.6103 = 0.6103. 

18.51: Power = 1 — P(Type Il error) = 1 — 0.36 = 0.64. 


18.53: (a) In the long run, this probability should be 0.05. 
Out of 100 simulated tests, the number of false rejections will 
have a binomial distribution with n = 100 and p = 0.05. Most 
students will see between 0 and 10 rejections. (b) If the power 
is 0.812, the probability of a Type II error is 0.188. Out of 100 
simulated tests, the number of false non-rejections will have 
a binomial distribution with n = 100 and p = 0.188. Most 
students will see between 10 and 29 non-rejections. 


Chapter 19 From Data Production to Inference: 
Part Ill Review 


19.1: (a) S = {Male, Female}. (b) S = {6, 7, 8,..., 19, 20}. 
(c) S = {All values 2.5 = VOP = 6.1 liters per minute}. 
(d) S = {All heart rates such that heart rate > __ bpm}; 
(students may select a reasonable minimum). 


19.3: (a) 1 — (0.68 + 0.18 + 0.10 + 0.02) = 0.02 
19.5: Y>1,orY = 2; P(Y=2) =1-027=0.23 
19.7: (d) 1 — 0.34 = 0.66 


19.9: P(X S 2) is the probability of randomly choosing a 
woman between the ages of 15 and 44 who has given birth to 
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two or fewer children. P(X S 2) = 0.471 + 0.169 + 0.204 = 
0.844. 


19.11: P(X = 3) = 0.104 + 0.034 + 0.018 = 0.156 


19.13: (a) The height of the density curve is 1/5 = 0.2, 
because the area under the density function must be 1. 


19.15: There is no area above X = 5 so P(4<X <7) = 
P4<X 5) =1/5=0.2. 


19.17: (c) mean = 100, standard deviation = 15/V/60 = 1.94. 


19.19: The answer in Exercise 19.16 would change, since this 
refers to the population distribution, which is now non-Normal. 
The answer in Exercise 19.17 would not change—the mean of x 
is 100, and the standard deviation of x is 1.94, regardless of the 
population distribution. The answer in Exercise 19.18 would, 
essentially, not change. The central limit theorem tells us that 
the sampling distribution of x is approximately Normal when n 
is large enough, no matter what the population distribution. 


19.21: If the population we’re sampling from is heavily 
skewed, then a larger sample is required for the central limit 
theorem to apply. Ifn = 15, the sampling distribution of x may 
not be approximately Normal, but if n = 150, it will surely be 
approximately Normal. 


19.23: (c) 357 + 1.962% = 322.35 to 391.65 

19.25: 357 + 1.2822% = 334.34 to 379.66 

19.27: (b) 172 + 1.64544 = 172 + 18.03 mg/dL. 
19.29: (c)n = (E%2ED)2 = 181.95, which rounds up to 182. 


19.341: (c) We want to know if your college differed. 
Hypotheses are in terms of population parameters, not sam- 
ple statistics. 


19.33: (c) a = 0.10 but not at a = 0.05. The P-value is 
0.0721. 


19.35: (dd) z= ary i = 14.538. The actual P-value is 
essentially 0. 

19.37: Ho: w = 100 vs. Hz: w < 100; z = VIS. = —8.79; 
P-value ~ 0. This is overwhelming evidence that the mean 
IQ for the very-low-birth-weight population is less than 100. 


19.39: P = 0.74 means that the observed difference is easily 
explained by random chance. P = 0.013 means that the 
observed difference was unlikely to have occurred by chance 
alone; such a result (or something more extreme) would be 
expected only 13 times in 1000 repetitions of this study. 


19.41: (b) This is a personal probability (Byron’s opinion), 
and not something based on many repetitions of a football sea- 
son (which would be impossible). 


19.43: (b) P(over 25 or local) = P(over 25) + P(local) — 
P(over 25 and local) = 0.7 + 0.25 — 0.05 = 0.90 


19.45: (b) P(over 25 | not local) = P(over 25 and not 
local)/P (not local) = 0.65/0.75 = 0.867 


19.47: (a) This is binomial with n = 5 and p = 0.4. 


19.49: P(X <600) = P(X = 599) = P(Z < —1.37) = 0.0853. 
19.51: (a) All probabilities are between 0 and 1, and their 
sum is 1. (b) Let R, be Taster 1’s rating and R, be Taster 2’s rat- 
ing. Add the probabilities on the diagonal (upper left to lower 
right): P(R; = R,) = 0.05 + 0.08 + 0.25 + 0.18 + 0.08 = 
0.64. (c) P(R, > R,) = 0.18. This is the sum of the ten num- 
bers in the “lower left” part of the table. P(R; > R,) = 0.18; 
this is the sum of the ten numbers in the “upper right” part of 


the table. 


19.53: (a) Nearly all the individual students should be in the 
range w+ 30 = 3.3 + 3(0.8) = 0.9 to 5.7. (b) The sample 
mean X has a N(u, o/ V100) = N (3.3, 0.08) distribution, so 
nearly all such means should be in the range 3.3 + 3(0.08) = 
3.3 + 0.24, or 3.06 to 3.54. 


19.55: (a) The stemplot confirms the description given in 
the text. (Arguably, there are two high “mild outliers,” although 
the 1.5 X IQR criterion only flags the highest as an outlier.) (b) 
PLAN: Let pw be the mean body temperature. Hy: pw = 98.6° 
vs. H,: w # 98.6°. SOLVE: Assume we have a Normal distri- 
bution and an SRS. x = 98.203°, so z a ape = —2.54. 
P = 2P(Z < —2.54) = 0.0110. CONCLUDE: We have 
fairly strong evidence—significant at a = 0.05, but not at 
a = 0.01—that mean body temperature is not equal to 98.6°. 
(Specifically, the data suggest that mean body temperature is 
lower.) 


19.57: PLAN: We will estimate w by giving a 90% confi- 
dence interval. SOLVE: Assume we have a Normal distribu- 
tion and an SRS (conditions were checked in Exercise 19.55). 
With x = 98.203, our 90% confidence interval for pw is 
98.203 + 1.645(<45) = 98.203 + 0.257. CONCLUDE: We 
are 90% confident that the mean body temperature for healthy 
adults is between 97.95° and 98.46°. 


19.59: For the two-sided test Hp: M = $50,000 vs. H,: 
M # $50,000 with significance level a = 0.10, we can 
reject Hy because $50,000 falls outside the 90% confidence 


interval. 


19.61: Let H be the event “student was home schooled.” 
Let R be the event “student attended a regular public school.” 
We want P(H | not R). Note that the event “H and not R” = 
“H.” Then P(H | not R) = petty = 2 SSsg = 0.025. 


19.63: (a) For n = 12 such users, the distribution of 
those who have visited an auction site in the past month is 
binomial with n = 12 and p = 0.50. P(X = 8) = (7)(0.5)8 
(1 — 0.5)"~§ = 0.1208. (b) With n = 500, the distribution of 
those who have visited online auction sites will be approxi- 
mately Normal because np = n(1 — p) = 500(0.5) = 250, 
which is at least 10. The standard deviation is o = 
V500(0.5)(1 — 0.5) = 11.18. P(X = 235) = P(Z = —1.34) = 
0.9099. 


CHAPTER 20 @ Inference about a Population Mean 


19.65: A Type I error means that we conclude the mean IQ 
is less than 100 when it really is 100 (or more). A Type II error 
means that we conclude the mean IQ is 100 (or more) when it 
really is less than 100. 


Chapter 20 
20.1: SE =s/Vn = 56.9/V1000 = 1.7993 minutes 
20.3: (a) t* = 2.353 (b) t® = 2.485 


20.5: (a) df=12—1=11,sot* = 2.201 (b) df =2 —1=1, 
so t® = 63.657 (c) df = 1001 — 1 = 1000, so t* = 1.646 


20.7: PLAN: We will estimate ys with a 99% confidence 
interval. SOLVE: We are told to view the observations as an SRS. 
A stemplot shows some possible bimodality, but no outliers. 
With x = 62.1667% and s = 5.806% correct identifications, 
and t* = 2.807 (df = 23), the 99% confidence interval for wu 
is 62.1667 + 2.807%52 = 62.1667 + 3.3267. CONCLUDE: 
We are 99% confident that the mean percent of correct 
answers to identifying the taller of two people by voice is 


between 58.84% and 65.49%. 


20.9: (a) df=5 — 1 =4 (b) t = 2.50 is bracketed by t* = 2.132 
(with two-tail probability 0.10) and t* = 2.776 (with two-tail 
probability 0.05). 0.05 < P < 0.10. (c) This test is significant at 
the 10% level since P < 0.10. It is not significant at either the 
5% or 1% levels. (d) From software, P = 0.0668. 


20.11: PLAN: Take pw to be the mean difference (with eye 
grease minus without) in sensitivity. We test Hp: w = O versus 
H,: w > 0, using a one-sided alternative because if the eye 
grease works, it should increase sensitivity. SOLVE: We must 
assume that the athletes in the experiment can be regarded as 
an SRS of all such athletes and that the treatments were ran- 
domized. We were provided the differences for each athlete; a 
stemplot of these differences (provided) seems to show two 
outliers; checking with the 1.5 X IQR rule, these are not outli- 
ers (Q; = —8, Q; = 26, and the upper fence is 77). However, 
P-values will only be approximate due to the skew and rela- 
tively small sample size. x = 0.1012 and s = 0.2263, so 
t = oye37vie = 1-79 with df = 15. Using Table C, P < 0.05 
(software gives 0.0469). CONCLUDE: We have evidence that 
eye grease does increase sensitivity to contrast. Due to the 
skew in the data, we may not want to place a lot of emphasis 
on this result. 
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20.13: The stemplot suggests that the distribution of nitro- 
gen contents is heavily skewed with a strong outlier. Although 
t procedures are robust, they should not be used if the popula- 
tion being sampled is this heavily skewed. 


20.15: (a) SE = 1.2/V10 = 0.379. (b) If you use Minitab, 
create a command file with a .MTB extension using a text 
editor such as Notepad. This author’s commands are shown. 


random 10 ¢2; 
Normal 0.3 1.2. 

let c3(k1) =mean(c2) 
let kL=k1+1 


From descriptive statistics on the randomization distribution, 
the standard deviation of the 1000 random sample means is 
0.3861; this is our estimate of the standard error, which is close 
to the value from the typical formula. 


Descriptive Statistics: C3 


Variable N N* Mean SE Mean StDev 
C3 1000 0 0.3149 0.0122 0.3861 
20.17: (b) We virtually never know the value of a. 

20.19: (c) df=16—-1=15 

20.21: (b) 1.476. Here, df = 5. 

20.23: (a) $38,808 to $51,192. The interval is computed as 


45,000 + 2.06422. 


20.25: (b) If you sample 225 unmarried male students, and 
then sample 225 unmarried female students, no matching is 
present. 


20.27: For the student group: t = TERY = 0.749. For the 
non-student group: t = OaTviE = 3.277. From Table C, the 
first P-value (assuming a two-sided alternate hypothesis) is 
between 0.4 and 0.5 (software gives 0.47), and the second 
P-value is between 0.005 and 0.01 (software gives 0.007). 


20.29: (a) With n = 1400, the CLT says the sample mean is 
approximately Normal, no matter the original distribution. (b) 
From Table C, take t* = 2.581 (df = 1000), or using software 
take t* = 2.579. For either value, the 99% confidence interval 
is 248 + t*(1) = 245.4 to 250.6, rounded to one decimal place. 
(c) Because the 99% confidence interval for yz is entirely above 
243, we can believe the mean for all Dallas children is above 
basic. 


20.31: (a) A subject’s responses to the two treatments would 
not be independent. (b) This is a two-sided test because the 
placebo could stimulate activity or suppress activity at this 
point in the brain. We have t = ee = —4.41. With 
df = 5, P = 0.0069, there is significant evidence of a difference. 


20.33: (a) A stemplot suggests the presence of outliers. The 
sample is small and the stemplot is skewed, so use of t proce- 
dures is not appropriate. (b) In the first interval, using all nine 
observations, we have df = 8 and t* = 1.860. For the second 
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interval, removing the two outliers (1.15 and 1.35), df = 6 and 
t* = 1.943. The two 90% confidence intervals are 


0.549 + 1.860(43) = 0.299 to 0.799 grams, 
and 
0.349 + 1.943(°S2) = 0.310 to 0.388 grams. 


(c) The confidence interval computed without the two outliers 
is much narrower and has a lower center. Using fewer data 
values reduces degrees of freedom (and gives a larger value of t*). 
By removing two values far from the others, s reduces from 


0.403 to 0.053. 


20.35: (a) The stemplot clearly shows the high outlier men- 
tioned in the text. (b) Let w be the mean difference (control 
minus experimental) in healing rates. We test Hy: w = 0 ver- 
sus H,: w > 0. The alternative hypothesis says that the con- 
trol limb has a faster healing rate. With all 12 differences: 
X = 6.417 and s = 10.7065, so t = qOaas7vaz = 2.08. With 
df = 11, P = 0.0311 (using software). Omitting the outlier: 
X = 4.182 ands = 7.7565, sot = 7s = 1.79. With df = 
10, P = 0.052. With all 12 differences there is greater evidence 
that the mean healing time is greater for the control limb. 
When we omit the outlier, the evidence is weaker. 


20.37: (a) A histogram of the sample shows a significant 
outlier, and indicates skew. We might consider applying t pro- 
cedures to the sample after removing the most extreme obser- 
vation (37, 786). (b) If we remove the largest observation, the 
remaining sample is not heavily skewed and has no outliers. 
Now, we test Ho: 4 = 7000 versus H,: w # 7000. With 
the outlier removed, x = 11,555.16 and s = 6,095.015. 
t= ESO V = 3.258. With df = 18 with software, P = 
0.0044. There is overwhelming evidence that the mean num- 
ber of words per day for men at this university differs from 


7000. 


20.39: (a) Each patient was measured before and after treat- 
ment. (b) The stemplot of differences shows an extreme right- 
skew, and one or two high outliers. The t procedures should 
not be used. (c) Some students might perform the test 
(Ho: w = O versus H,: w > 0) using t procedures, despite the 
presence of strong skew and outliers in the sample. If so, they 
should find x = 156.36, s = 234.2952, and t = 2.213, yielding 
P = 0.0256. 


20.41: (a) We test Ho: w = 0 versus H,: wu > 0, where p is 
the mean difference (treated minus control). This is one-sided 
because the researchers have reason to believe that CQ) will 
increase growth rate. (b) x = 1.916 and s = 1.050, so 
t = Tose = 3.16 with df = 2, P = 0.044. This is significant 
at the 5% significance level. (c) For very small samples, t pro- 
cedures should only be used when we can assume that the 
population is Normal. We have no way to assess the Normality 
of the population based on these three observations. 


20.43: A stemplot reveals that these data contain two 
extreme high outliers (5973 and 8015). t procedures are not 
appropriate. 


20.45: (a) x = 48.25 and s = 40.24 thousand barrels. From 
Table C, t* = 2.000 (df = 60). Using software, with df = 63, 
t* = 1.998. The 95% confidence interval for w is 48.25 + 
2.000(S24) = 48.25 + 10.06 = 38.19 to 58.31 thousand bar- 
rels. (Using software, the confidence interval is 38.20 to 58.30 
thousand barrels.) (b) A stemplot confirms the skewness and out- 
liers described. The two intervals have similar widths, but the new 
interval is shifted higher by about 2000 barrels. Although t proce- 
dures are fairly robust, we should be cautious about trusting the 
result in part (a) because of the strong skew and outliers. The 
computer-intensive method may produce a more reliable interval. 


20.47: PLAN: We will construct a 90% confidence interval 
for pz, the mean percent of beetle-infected seeds. SOLVE: A 
stemplot shows a single-peaked and roughly symmetric distri- 
bution. We assume that the 28 plants can be viewed as an SRS 
of the population, so t procedures are appropriate. x = 4.0786 
and s = 2.0135 percent. Using df = 27, the 90% confidence 
interval for w is 4.0786 + 1.703252) = 4.0786 + 0.648 = 
3.43% to 4.73%. CONCLUDE: The beetle infects less than 5% 
of seeds, so it is unlikely to be effective in controlling velvetleaf. 


20.49: x = 0.5283, s = 0.4574, and df = 5. A 95% confi- 
dence interval for the mean difference in T cell counts 
after 20 days on blinatumomab is 0.5283 + 2.571034) = 
0.5283 + 0.4801 = 0.0482 to 1.0084 thousand cells. 


20.51: (a) For each subject, randomly select which knob 
(right or left) that subject should use first. (b) PLAN: 
We test Hy: w = 0 versus H,: w < 0, where w denotes the 
mean difference in time (right-thread time — left-thread 
time). SOLVE: A stemplot of the differences gives no rea- 
son that t procedures are not appropriate. We assume our 
sample can be viewed as an SRS. x = —13.32 seconds and 
s = 22.936 seconds, sot = aoiavE = —2,90. With df = 24 
we find P = 0.0039. CONCLUDE: We have good evidence 
(significant at the 1% level) that the mean difference really 
is negative—that is, the mean time for right-hand-thread 
knobs is less than the mean time for left-hand-thread knobs. 


20.53: Refer to Exercise 20.51. With df = 24, t* = 1.711, 
so the confidence interval for w is given by —13.32 + 
1.711(2498) = -13.32 + 7.85 = —21.2 to —5.5 seconds. Now, 
Xru/Xity = 104.12/117.44 = 0.887. Right-handers working 
with right-handed knobs can accomplish the task in about 89% 
of the time needed by those working with left-handed knobs. 


20.55: (a) Starting with Table C values, for 90% confidence, 
t(100) = 1.660 and z* = 1.645. For 95% confidence, t(100) = 
1.984 and x* = 1.96. Similarly, for 99% confidence, t(100) = 
2.626 and z* = 2.576. The differences are 0.015, 0.024, and 
0.05. Larger confidence levels will need more observations. 
We note that t(1000) is within 0.01 of z* for all these confi- 
dence levels. Using software, we find that t(150) = 1.655 for 
90% confidence, t(240) = 1.9699 for 95% confidence, and 
t(485) = 2.586 for 99.5% confidence. (b) Answers will vary. 
We'll note that the effect of the standard deviation difference 
multiplies the margin of error in the calculation by 100, which 
implies that “similar” takes more observations with o = 100 


than for 0 = 1. Using n = 485 with a = 100, the 99% t mar- 
gin of error is 11.74, compared to a 99% z margin of error of 
11.69. Using o = 1, the margins of error are both 0.117, 
rounding to three decimal places. 


Chapter 21 Comparing Two Means 


21.1: This is a matched-pairs design. Each plot is a matched 
pair. 


21.3: This involves a single sample. 


21.5: PLAN: Test Ho: wi = Mnowa versus Hy: yi > MNowiie 
We use a one-sided alternative that practice with the Wii™ 
should result in more improvement than just performing the 
same operation again. SOLVE: These data came from partici- 
pants in a randomized experiment, so the two groups are inde- 
pendent. Stemplots suggest some deviation from Normality, 
and a possible high outlier for the No Wii™ group. Boxplots 
(not shown) indicate no outliers and a relatively symmetric 
distribution for the Wii™ group, but both the —88 and 229 are 
outliers for the No Wii™ group. We proceed with the t test for 
two samples appealing to robustness (especially good with 
equal sample sizes). With xy = 132.71, Xnowi = 59.67, 
Swi = 98.44, 5) = 63.04, ny; = 21, and nw; = 21, 
SE = Vi + 2 = 25.509 and t = 1g¢-® = 2.86. Using df 
as the smaller of 21 — 1 and 21 — 1, we have df = 20, and 
0.0025 < P < 0.005. Using software, df = 34.04 and P = 
0.0036. CONCLUDE: There is very strong evidence that 
playing with a Nintendo Wii™ does help improve the skills of 
student doctors, at least in terms of the mean time to complete 
a virtual gall bladder operation. 


21.7: From Exercise 21.5, Xy,; = 132.71, Xywwy = 59.67, 
Nwi = Nnowi = 21, SE = 25.509. A 99% confidence interval 
for the mean difference in improvement in time to complete 
the virtual gall bladder operation is Xwi — Xnowa = t*SE = 
73.04 + 2.845(25.509) = 0.467 to 145.613 seconds. Soft- 
ware uses df = 34.04 and gives an interval of 3.45 to 142.65 


seconds. 


21.9: (a) Back-to-back stemplots appear to be reasonably 
Normal, and the discussion in the exercise justifies our treating 
the data as independent SRSs, so we can use t procedures. We 
test Ho: fm, = py versus H,: w, < pz, wheres; is the popula- 
tion mean time in the restaurant with no scent, and p12 is the 
mean time in the restaurant with a lavender odor. Here, with 
x, = 91.27, x = 105.700, s; = 14.930, s. = 13.105, n, = 30, 
andn, = 30:SE= V2 + 2=3.627andt = “Z* = —3.98. 
Using software, df = 57.041, and P = 0.0001. Using the more 
conservative df = 29 (lesser of 30 — 1 and 30 — 1) and Table 
C, P < 0.0005. There is very strong evidence that customers 
spend more time on average in the restaurant when the laven- 
der scent is present. (b) Back-to-back stemplots of the spend- 
ing data are skewed and have many gaps (perhaps due to 
pricing). We test Ho: wy = po versus H,: wy < pr, wheres, is 
the population mean amount spent in the restaurant with no 
scent, and 1; is the mean amount spent in the restaurant with 
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lavender odor. Here, with x, = €17.5133, x = €21.1233, 
s, = €2.3588, s) = €2.3450, ny = 30, and n. = 30: 
SE = Vi+ 2 = €0.6073 and t = =5® = -5.94. Using 
software, df = 57.998 and P < 0.0001. Using the more conser- 
vative df = 29 and Table C, P < 0.0005. There is very strong 
evidence that customers spend more money on average when 
the lavender scent is present. 


21.11: We have two small samples (n; = n; = 4), so the t 
procedures are not reliable unless both distributions are 
Normal. 


21.13: Here are the details of the calculations: 


SE atone = AS = 0.1118 
SEsyiends i = 0.1811 


SE = V SE Alone a SE Friends = 0.21283 


SE* 
a — = pQQs, = 35.284 


1 70.682 1 70.837 
36037.) + agar)” 


_ 0.29 — ( -0.19) _ 
c= 0.21283 = L295 


21.15: Reading from the software output shown in the state- 
ment of Exercise 21.13, we find that there is a significant dif- 
ference in mean perceived formidability for men alone and 
with friends (t = 2.255, df = 35.3, P <0.02). Because larger 
scores indicate greater perceived formidability, it appears that 
foes appear more formidable when alone as opposed to when 
with friends. 


21.17: The observed means are x; = 162.825 and x) = 157.6. 
The observed difference in means is 5.225. From Figure 21.6, 
the one-sided P-value is about 0.06 + 0.11 + 0.04 + 0.04 = 
0.25. We have no evidence that an increase in weeds from one 
per meter to nine per meter decreases the crop yield. 


21.19: (a) a two-sample t test. We have two independent 
populations: females and males. 


21.21: (b) Confidence levels and P-values from the t proce- 
dures are quite accurate even if the population distributions 
are not exactly Normal. 


24.23: (4) 22 = 3,05 
21 ral 


21.25: (a) We suspect that younger people use social net- 
works more than older people, so this is a one-sided alternative. 


21.27: (a) To test the belief that women talk more than 
men, test Ho: wp = by versus H,: wp > by. (b)-(d) The 
table provides a summary of t statistics, degrees of freedom, and 
P-values for both studies. We take the conservative approach 
for computing df as the smaller sample size, minus 1. 


STUDY t df TABLE C VALUES P-VALUE 
1 —0.248 55 |t| < 0.679 P>0.25 
2 1507 19 1.328<t<1.729 0.05<P<0.10 
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Note that for Study 1 we referenced df = 50 in Table C. (e) 
The first study gives no support to the belief that women talk 
more than men; the second study gives weak support, signifi- 
cant only at a relatively high significance level (a = 0.10). 


21.29: (a) Call group 1 the Stress group, and group 2 the No 
stress group. s = SEMVn, so s, = 3V20 = 13.416 and 
sy = 2V51 = 14.283. (b) Using conservative Option 2, df = 
19 (the lesser of 20 — 1 and 51 — 1). (c) We test Ho: wy = pr 
versus H.: 4 1 bo With ny =20 and n,=51, 
SE = Vi + 2 = 3.605, and t = 5 # = 4" 1.664. 

With df = 19, using Table C, 0.10 < P < 0.20. There is little 
evidence in support of a conclusion that mean weights of rats in 
stressful environments differ from those of rats without stress. 


21.31: (a) Wetest Ho: piss = ie versus H,: bis75 > M2006 
SE = Vo8t + $8 = 0.02928, ¢ = #53 = 1.708. This 
is significant at the 5% level: P = 0.0439 (df = 2353.38) o 
0.025 < P < 0.05 (df = 1000). There is good evidence 
mean job satisfaction decreased from 1975 to 2006. (b) The 
difference is barely significant at the 0.05 level (most likely due 
to the large sample sizes). Knowing that 1975 had the highest 
mean job satisfaction score in this time period casts doubt 
about whether this is actually decreasing. Also, a difference of 
0.05 in the means may not be of practical importance. 


21.33: Let pc be the mean brain size for players who have had 
concussions, and let yc be the mean for those who have not 
had concussions. We test Ho: we = fnc versus Hy: We # Mc: 

This is a two-sided test, as we simply want to know if there is a 


difference in mean brain size. SE = V3 + 8B = 203.5803, 
and t = 734° = —3.463. Using ne conservative version 


for df (Option 2), df = 24, and 0.002 < P < 0.005. Using soft- 
ware, df = 44.43, and P = 0.0012. There is strong evidence that 
the mean brain size is different for football players who have had 
concussions as opposed to those who have not had concussions. 
(b) The fact that these were consecutive cases indicates they are 
not arandom sample of all football players who have or have not 
had concussions. That could weaken or negate the results of the 
test. We’d need more information about how and why these 
players were referred to the institute. 


21.35: (a) The hypotheses are Ho: a; = [2 versus H,: wu; > M2, 
where 4; is the mean gain among all coached students, and 


#2 is the mean gain among uncoached students. We find 
SE = Vaz + By = 3.0235, and t = 4557 = 2.646. Using 
the conservative approach, df = 100 in Table C, and we obtain 
0.0025 < P < 0.005. Using software, df = 534.45, and P = 
0.0042. There is evidence that coached students had a greater 
average increase than uncoached students. (b) The 99% con- 
fidence interval is 8 + t* (3.0235) where t* equals 2.626 (using 
df = 100 with Table C) or 2.585 (df = 534.45 with software). 
This gives either 0.06 to 15.94 points, or 0.184 to 15.816 
points, respectively. (c) Increasing one’s score by 0 to 16 points 
is not likely to make a difference in being granted admission or 
scholarships from any colleges. 


21.37: (a) Neither sample histogram suggests strong skew or 
presence of strong outliers; t procedures are reasonable. (b) Let 


4, be the mean tip percentage when the forecast is good, and 
fy be the mean tip percentage when the forecast is bad. We 
have xX, = 22.22, x), = 18.19, s; = 1.955, s) = 2.105, n,; = 20, 
and n, =_20. We test Hg: ww; = fy versus H,: wy, # M2. Here, 
SE = Vi + 2 = 0.642 and t = "5 = 6.274. Using df = 
19 (the conservative Option 2) and Table C, we have P < 
0.001. Using software, df = 37.8, and P < 0.00001. There is 
overwhelming evidence that the mean tip percentage differs 
between the two types of forecasts presented to patrons. 


21.39: Refer to results in Exercise 21.37. Using df = 19, 
t® = 2.093 and the 95% confidence interval for the difference 
in mean tip percentages between these two populations is 
22.22 — 18.19 + 2.093(0.642) = 4.03 + 1.34 = 2.69 to 5.37 
percent. Using df = 37.8 with software, t* = 2.025 and the 
corresponding 95% confidence interval is 2.73% to 5.33%. 


21.41: (a) The mean rating for those with a positive attitude 
toward Mitt is larger than the mean for those with a negative 
attitude; the standard deviations are relatively large, however. 
(b) The distribution of ratings for those with positive attitudes 
toward Mitt is somewhat right skewed (but not extremely so). 
The distribution of ratings for those with negative attitudes 
toward Mitt is fairly symmetric. A check with a boxplot 
indicates the two lowest values are not outliers. (c) We find 
SE = V S80l> 4 OST — 0.22556 and t = 2323018 — 1.452, 
for which the P- ie is 0.10 < P < 0.20 (using df = 28) or 
0.1521 (using software, with df = 55.08). There is no evidence 
of a difference in the mean trustworthiness rating of Mitt 
Romney’s face according to whether or not college students 
had a positive or negative attitude about Mitt (as compared to 
Barack Obama). 


n x s 


3.9379 
3.6103 


0.8015 
0.9127 


Positive 29 


Negative 29 


21.43: (a) Stemplots suggest that there is some skew in both 
populations, but the sample sizes should be large enough to 
overcome this problem. (b) With subscripts as assigned in the 
statement of the problem (Group 1 = Women), we test 
Ho: M1 = fy versus H,: wy > pb. We have x, = 16,496.1, 
x, = 12,866.7, s, = 7914.35, s, = 8342.47, n, = 27, and 
ny, = 20; we find SE = V2 + S4H24r — 2408.26, and 
(=e With df = 39.8 (software), P = 
0.070. Using Table C with df = 19, 0.05 < P < 0.10. There is 
some evidence that, on average, women say more words than 
men, but the evidence is not particularly strong. 


21.45: We test Ho: w; = po versus H,: w, # 2, where py is 
the mean days behind caterpillar peak for the control group, 
and p, is the mean days for the supplemented group. 
Now, with x; = 4.0, % = 11.3, s; = 3.10934, s) = 3.92556, 
n, = 6, and n,=7, we find SE = V2 + 2 = 1.95263, 
t = 42543 = —3.74. The two-sided P-value is either 0.01 < 
P <0.02 (using df = 5) or 0.0033 (df = 10.96 with software), 


agreeing with the stated conclusion (a significant difference). 
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21.47: PLAN: We test Ho: w; = m2 versus H,: wy > pe, 
where 1; is the mean time for the treatment group, and #2 is the 
mean time for the control group. The alternative hypothesis is 
one-sided because the researcher suspects that the treatment 
group will wait longer before asking for help. SOLVE: We must 
assume that the data comes from an SRS of the intended popula- 
tions. A back-to-back stemplot shows some irregularity in the 
treatment times and skewness in the control times. We hope that 
our equal and moderately large sample sizes will overcome any 
deviation from Normality. With x, = 314.0588, x, = 186.1176, 
— = a fee s, = 118.0926, n, = 17, and n, = 17, we find 

A /s. += = 50.7602, and t= 314.0588 - 586176 — 257], 
re which 0. O1 “<P <0.02 (df = 16) or P = 0. 0088 (df = 28.27). 
CONCLUDE: There is strong evidence that the treatment group 
waited longer to ask for help on average. 


21.49: PLAN: We test Ho: uw, = m, versus H,: wy, > py, 
where 1, is the mean score for the Active group, and jz) is the 
mean score for the Traditional group. SOLVE: We must assume 
that the data comes from an SRS of the intended populations. 
The stemplots for each sample show no heavy skew and no 
outliers. x; = 3.6, % = 4.74, s; = 2.41, 5, = 2.85, n; = 15, 
and n, = 23. Of course, because X; < xX), we will not con- 
clude | that pw, > pw. We find SE = Vs +* = 0.86 and 
t = 26544 = —1,32, for which P > 0.50, fepardlices of df. If 
you use ete, df = 33.42, and P = 0.9026. CONCLUDE: 
There is no support for a conclusion that active learning yields 
a higher average score than traditional learning. 


21.51: (a) Refer to Exercise 21.49 for details. For 90% confi- 
dence, t* = 1.761 (using df = 14) or t* = 1.692 (using df = 
33.42). A 90% confidence interval for fw; — pz is 3.6 — 4.74 + 
t* (0.86), or —2.65 to 0.37 (using df = 14) or —2.60 to 0.32 
(using df = 33.42). (b) Now we want a 90% confidence inter- 
val for the mean change in score for the active class. That is, 
we construct a 90% confidence interval for jz,. With 15 obser- 
vations, we have df = 14, and t* = 1.761. The confidence 
interval is given by 3.6 + 1.761245 = 2.50 to 4.70. 


21.53: Because this exercise asks for a “complete analysis” 
without suggesting hypotheses or confidence levels, student 
responses may vary. This solution gives 95% confidence intervals 
for the means in parts (a) and (b), and performs a hypothesis test 
and gives a 95% confidence interval for part (c). Note that the 
first two problems call for single-sample t procedures (Chapter 
20), while the last uses the Chapter 21 procedures. Student 
answers should be formatted according to the “four-step process” 
of the text; these answers are not formatted as such, but can be 
used to check student results. We begin with summary statistics. 


n x Ss 
Women 95 4.2737 2.1472 
Men 81 6.5185 3.3471 


A back-to-back stemplot of responses for men and women 
reveals that the distribution of claimed drinks per day for 
women is slightly skewed, but has no outliers. For men, the 
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distribution is only slightly skewed but contains four outliers. 
However, these outliers are not too extreme. In all problems, it 
seems use of t procedures is reasonable. 


(a) We construct a 95% confidence interval for p,,, the mean 
number of claimed drinks for women. Here, t* = 1.990 (df = 
80 in Table C) or t* = 1.9855 (df = 94, software), and SE = 
2.1472/V95 = 0.2203. A 95% confidence interval for j1,, is 
4.2737 = 1.990(0.223) = 3.84 to 4.71 drinks. The interval 
using software is virtually the same. With 95% confidence, the 
mean number of claimed drinks for women is between 
3.84 and 4.71 drinks. (b) We construct a 95% confidence inter- 
val for fn, the mean number of claimed drinks for men. Here, 

= 1.990 (df = 80 in Table C or software), and 
SE = 3.3471/V81 = 0.3719. A 95% confidence interval for 
Mm is 6.5185 + 1.990(0.3719) = 5.78 to 7.26 drinks. With 


95% confidence, the mean number of claimed drinks for men is 


between 5.78 and 7.26 drinks. (c) We test Hy: fn = py Versus 
H4: fin * fy. We have SE = V24#* + 23478 = 0.4322, and 


t = BI 8518 = 5,193. Regardless of the choice of df (80 
or 132.15), this is highly significant (P < 0.001). We have very 
strong evidence that the claimed number of drinks is different 
for men and women. To construct a 95% confidence interval 
for Pm — Mw, we use t* = 1.990 (df = 80) or t* = 1.9781 
(df = 132.15). Using x, — x, + t* V2 . + 2, we obtain either 
2.2448 + 0.8601 or 2.2448 + 0.8549. After rounding either 
interval, we report that with 95% confidence, on average, soph- 
omore men who drink claim an additional 1.4 to 3.1 drinks per 
day compared with sophomore women who drink. 


Chapter 22 
Proportion 


22.1: (a) The population is surgical patients. p is the propor- 
tion of all surgical patients who will test positive for Staphylo- 
1251 


coccus aureus. (b) f = g77; = 0.185, or 18.5%. 


22.3: (a) Because np = 0.90(1500) = 1350 and n(1 — p) = 
0.10(1500) = 150 and both are at least 10, the sampling 
distribution of f is approximately Normal with mean p = 0.90 
and standard deviation V%—2) = 1/0°0-—90) — 0.0077. 
(b) If n = 6000, the sampling distribution on p is approxi- 
mately Normal with mean p = 0.90 and standard deviation 


fee = 1/220 050) — 9.0039. 


22.5: (a) The survey excludes residents of the northern ter- 
titories, as well as those who have no phones or have onl y cell 
phone service. (b) fp = 132 = 0.8558 so SE = VU— 
0.009050; the 95% confidence interval is 0.8558 + 
(0.009055) = 0.8381 to 0.8735, or 83.8% to 87.4%. 


22.7: (a) fp = 7% =0.255, so the margin of error is 
1.96\/ 22230 = 0.255) 10?) = 0.0665. (b) For a +3% margin of error, 
we'll need at ‘least n = (439)(0.255)(1 — 0.255) = 810.898, 
so 811 visitors over age 65. 


Inference for a Population 


(1.96) 


22.9: SOLVE: Let p be the proportion of times a spun Belgian 
euro coin lands heads. We test Ho: p = 0.50 vs. H,: p ¥ 0.50. 
Because the sample consists of 250 trials, we expect 125 
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“successes” (heads) and 125 “failures” (tails); both of these are 
at least 10 and we assume the sample represents an SRS of 
all possible coin spins so_conditions are met. p = 7% = 
0.56 SE = Vella bl — 1/0300 059) — 0.9316. x= bab = 
056-020 — 1.90, did P = 0.0574. CONCLUDE: There is some 
evidence that the proportion of times a Belgian euro coin spins 
heads is not 0.50; the P-value is close to 0.05, but not less than 
0.05. Perhaps more spins would be conclusive. 


22.11: (a) The number of trials is not large enough. The 
expected number of successes (heads) and the expected 
number of failures (tails) are both 5. These should be 10 or 
more. (b) As long as the sample can be viewed as an SRS, a 
z test for a proportion can be used. (c) Under the null 
hypothesis, we expect only 200(0.01) = 2 failures. We 
should have at least 10 expected failures and at least 10 
expected successes. 


22.13: The large-sample conditions are met because we had 
113 people who have experienced computer crime and 1025 — 
113 = 912 people who have not; both are at least 15. 
(a) p = #3 =0.1102, and SE; = V2C>® = 0.0098. A 
95% confidence interval for p is given by 0.1102 + 


1.96(0.0098) = 0.0910 to 0.1294. (b) Using the plus four 
method, 6 = 344 = 0.1118, and SE, = V#i5.2 = 0.0098. 


A 95% confidence interval for p is given by 0.1118 + 
1.96(0.0098) = 0.0926 to 0.1310, or 9.3% to 13.1%. These 
intervals are virtually identical, but the plus four confidence 
interval is very slightly shifted to the right. 


22.15: (b) The sampling distribution of f has mean p = 
0.60. 


22.17: (b) The 90% confidence interval is 0.80 + 
1.645/ 28011 = 0.80) 

4500 
22.19: () n = (©)p*(1 — p*) = (F3)0.5)0.5) = 4147.36, 


so we would need at least 4148. 


22.21: 
of error. 


22.23: 


(a) Sources of bias are not accounted for in a margin 


(c) The P-value is 0.0057. 


22.25: (a) The survey excludes those who have no phones or 
have only cell-phone service. (b) We have 848 “Yes” answers 
and 162 “No” answers; both of these are at least 15. With the 


sample proportion p = 3 = 0.8396, the large sample 95% 
confidence interval is 0.8396 + 1.96 \/ 2826 — 0836) — ¢ 8170 
to 0.8622. 


22.27: (a) fp = 3S = 0.8396, SE = 0.01155, so the margin 
of error is 1.96SE = 0.02263 = 2.26%. (b) If instead f = 
0.50, then SE = 0.01573 and the margin of error for 95% 
confidence would be 1.96 SE = 0.03084 = 3.08%. (c) For 
samples of about this size, the margin of error is no more than 
about +3% no matter what is. 


22.29: (a) The survey excludes residents of Alaska and 
Hawaii, and those who do not have cell-phone service. (b) 
We have 422 successes and 2063 failures (both at least 15), 


so the sample is large enough to use the large-sample 
confidence interval. We have p = 35 = 0.1698, and 
SE = V Ses 2) = 0.0075. For 90% confidence, the 
margin of error is 1.645SE = 0.0124 and the confidence inter- 
val is 0.1574 to 0.1822, or 15.7% to 18.2%. (c) Perhaps people 
who use the cell phone to search for information online are 
younger, and more interested in sexually related topics. 


22.31: (a) Large-sample methods are safe because we have 
171 successes and 880 — 171 = 709 failures. We have 
6 = =0.1943, SE = Vosu os) — 0.01334, the 
margin of error is 1.96SE = 0.02614, and the 95% confi- 
dence interval is 0.1682 to 0.2204. (b) It is likely that more 
than 171 respondents have run red lights. We would not 
expect people to claim that they have run red lights when 
they have not, but some people will deny running red lights 
when they have. 


22.33: (a) The margins of error are 1.96V%452 = 
0.196V p(1 — p) j{below). (b) With n = 500, the margins of 
error are 1.96V°U—) = 0.088Vf(1 — f). The new mar- 


gins of error are less than half their former size. 


p 0.1 0.3 0.5 0.7 0.9 
(a) me. | 0.0588 0.0898 0.0980 0.0898 0.0588 
(b) mee. | 0.0263 0.0402 0.0438 0.0402 0.0263 


22.35: PLAN: With p representing the proportion of songs 
downloaded by Rina, we test Hp: p = 0.50 versus H,: p # 0.50. 
The test is two-sided because we wonder if the proportion 
loaded by Rina differs from that loaded by Ed. SOLVE: We 
assume that the 50 songs sampled are an SRS. With 50 songs, 
we expect 50(0.50) = 25 each successes and failures (both at 


least ae so conditions for use of the test are satisfied. 
p = 3 = 0.68, so x = AS—8*_ = 2.55 and P = 0.0108. 


50 
CONCLUDE: There is strong evidence that the proportion 
of songs downloaded by Rina differs from 0.50. In fact, it 
seems that Rina downloaded more than Ed. (b) The condi- 
tions for a large sample confidence interval are met because 34 
of the sample’s 50 songs were downloaded by Rina and 16 by 
Ed; both of these are larger than 15. The 95% confidence 
interval _for the proportion downloaded by Rina is 0.68 + 
1.96 VESG— 88) — 0.5507 to 0.8093. At 95% confidence, 
Rina has downloaded between about 55% and 81% of the 
songs on their player. 


22.37: (a) Ho: p = 0.15 and H,: p > 0.15. Here, we expect 
0.15(61) = 9.15 < 10 “successes”; there is a possible problem 
with the use of the z test in this case. Whether we can view this 
particular class as a simple random sample (of all this teacher’s 
students? of all AP statistics students?) is questionable. The 
chance of obtaining a sample proportion at least as large as 
p = 8 = 0.2459 if the true proportion is p = 0.15 is 0.0178 
(1.78%) because z ee = 2.10. (c) Answers will 
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vary. This was not a designed, randomized experiment, so we 
cannot say the cash incentive “caused” the increase in 5s. 


22.39: (a) PLAN: Hy: p = 0.50 and H,: p # 0.50. The 
alternate is two-sided because we want to know if subjects are 
not equally likely to choose either of the two positions. SOLVE: 
We assume we have an SRS from the population. With 32 sub- 
jects, we expect 16 successes (people that picked the first 
wine) jand 16 failures (people that picked the second wine). 
pb = $ = 0.6875 and x = VESHOSOTSE = 2.12 with P-value 

2(0.0170) = 0.0340. CONCLUDE: We have strong evidence 
that people are not equally likely to choose either of two options. 
It appears they are more likely to select the first presented wine 
as their preference. (b) People that would go out of their way to 
participate in such a study are presumed to represent the popu- 
lation of all wine drinkers (or adults). The assumption that we 
have a simple random sample may not be reasonable. 


22.41: PLAN: We obtain the sample size required to estimate 
the proportion of wine tasters that select the first choice to 
within +0.05 with 95% confidence. SOLVE: We guess that the 
unknown value of p is 0.6875, as computed in Exercise 22.39. 
n = (£)p"(1 — p") = (538)2(0.6875)(1 — 0.6875) = 330.14, 
so take n = 331. CONCLUDE: We require a sample of at least 
331 wine tasters in order to estimate the proportion that would 
choose the first option to within 0.05 with 95% confidence. 


22.43: PLAN: We will give a 90% confidence interval for 
the proportion of all Krameria cytisoides shrubs that will 
resprout after fire. SOLVE: We assume that the 12 shrubs in 
the sample can be treated as an SRS. Because the number of 
resprouting shrubs is just 5, the conditions for a large sample 
interval are not met. Using the plus four method: 
D = 744 = 0.4375, SE = 0.1240, the margin of error is 
1.645SE = 0.2040, and the 90% confidence interval is 0.2335 
to 0.6415. CONCLUDE: We are 90% confident that the pro- 
portion of Krameria cytisoides shrubs that will resprout after fire 
is between about 0.23 and 0.64. 


Chapter 23 Comparing Two Proportions 


23.1: PLAN: Let py be the proportion of all males who have 
used the Internet to search for health information and p, be the 
proportion of females who have done so. We want a 95% confi- 
dence interval for the difference in these proportions. SOLVE: 
The samples were large with clearly more than 10 “successes” and 
“failures” in each sample. fy = fog = 0.6200 and py = fay = 
0.4797. SE = \/Ss2— 0.62) 4 047971 — 04797) — 0.09203, (0.62 — 
0.4797) + 1.96(0.0203) = 0.1403 + 0.0398. CONCLUDE: 
We are 95% confident that between 10% and 18% more women 
than men have looked for health information on the Internet. 


23.3: PLAN: Let p, be the proportion of 18- to 29-year- 
olds who think claims about the environment are exagger- 
ated and p, be the proportion of those 60 and older 
who think this. We want a 95% confidence interval for the 
differences in these proportions. SOLVE: The samples 
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were large with clearly more hen 10 “successes” and ai 
ures” in each sample. fp, = $4 = 0.4628 and f, = = 
0.2988. SE = \ ss 0.4628) 4 0.2988(1 — 0.2988) — 9 9397, 
(0.4628 — 0.2988) + 1.96(0.0387) = 0.164 + 0.0759. 
CONCLUDE: Based on these samples, we are 95% confi- 
dent that between about 8.8% and 24.0% more people 60 
and older believe that claims about the environment are 
exaggerated than people 18 to 29 years old. 


23.5: PLAN: Let p; and p; be (respectively) the proportions 
of injured skiers and injured snowboarders who wear helmets. 
Ho: p; = p2 versus H,: py < po. SOLVE: The smallest count is 
96, so conditions are met. fp, = a = 0.1661 and b= = $997 = 
0.2193. B = sq¢+-Seo7 = 0.2106. SE = VA(1 — 6)(s3 + 92) = 
0.01853. z = Se G23 = _2.87, and P = 0.0021. CON- 
CLUDE: We have strong evidence (significant at a = 0.01) 
that skiers and snowboarders with head injuries are less likely 
to use helmets than skiers and snowboarders without head 
injuries. 


23.7: (a) One count is only 6, and guidelines for using the 
large-sample confidence interval call for all counts to be at least 
10. (b) The new sample s sizes are 55 and 110, | success counts 
are 7 and 46. (c) f, = $4 = 0.1273 and p) = 44 = 0.4182. 
A plus four 95% confidence interval for py — pz is (0.1273 — 
0.4182) + 1.96V/2G Gb) 4 BU) = — 0.2909 + 0.1275 = 
—0.4184 to —0. 1634. With 95% confidence, among injured 
skaters the difference between proportion with wrist guards and 
those without is between —41.8% and —16.3%. It appears that 
more injured skaters fail to wear wrist guards. 


23.9: (a) The question is, “Is there a difference?” 
23.11: (b) p = Bt 


23.13: (b) = 1 /os0il ire 0801) ores — C768) _ 9.9779, 
The margin Pa error is 1.645(0.0229) = 0.0377. 


23.15: (b) We have only three failures in the treatment 
group and only two successes in the control group. 


23.17: (a) The four counts are 117, 53, 152, and 165, so all 
counts are large enough. (b) pf, = 175 = 0.6882, and 
* 152 


bo = 317 = 0.4795; the 95% confidence interval is ; — p) + 
1.96\V/ ba bi) 4 PP) = 0.2087 + 0.0887. Based on 
these samples, between 12% and 29.7% more younger teens 
than older teens have posted false information in their online 
profiles, at 95% confidence. 


23.19: (a) One of the counts is 0. (b) The new sample size 
for the treatment group is 35, 24 of which have tumors; 
the new sample size for the control group is 20, 1 of which has 
a tumor. (c) p; = 3+4 = 0.6857 and p, = 4 = 0.05. 
The plus four 99% confidence interval is p; — p) + 
2.576V2G, } = 0,6357 + 0.2380. We are 99% 
confident aa isaisdae DNA methylation increases the inci- 
dence of tumors by between about 40% and 87%. 


zu + OS 


23.21: (a) Let p; and p; be (respectively) the proportions of 
subjects in the music and no music groups that receive a passing 
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grade on the Maryland HSA. Ho: py = p2 versus H,: p; # p>. 
= 0.870, p, = $37 = 0.750, p = $35 +5R) = 0.815. 


b. ~ bp 


z= ; = = 11.94. P < 0.0001. We have over- 
pal - Ass as om) 

whelming evidence that the proportion of music students pass- 
ing the Maryland HSA is greater than that for the no music 
group. (b) and (c) This is an observational study—people who 
choose to (or can afford to) take music lessons differ in many 
ways from those who do not. We cannot conclude that music 
causes an improvement. 


23.23: We have at least 10 successes and, a failures 
in both samples. For the music group, fi = 339 = = 0.870. 
For the no music group, b, = Sey = 0.750. f, — fo + 


bi = 339 = 


1.96 bth) + BO) = 0.100 to 0.140, or 10.0% to 
14.0%. 

23.25: (a) He: Pu _ Dr versus Hi Du F Dre: bu = iz _ 
0.1415, pr=5=0.1667, and f =0.1486. Then, 


SE = Vb bas + 2) = 0.06485, so x= Qheaes' = 
—0.39. P = 0.6966, which pom no evidence of a difference 
in failure rates. (b) fy = 349 = 0.1415, pr = fp = 0.1667, 
and p = 0.1486, but now SE = V/6(1 — 5)(ab5 + id) = 
0.01184, soz = bu be = = —2.13 and P = 0.0332. (c) We are 
asked to construct two confidence intervals—one based on 
the smaller samples of part (a) and one based on the larger 
samples of part (b). For case (a), py = 0.1415 and pp = 
0.1667, so a 95% confidence interval fy — fp + 
1.96 bi bo 4 bbe) = 0.156 to 0.1056. We note 
that because there were only seven business failures in those 
headed by women, this interval is not really appropriate. For 
case (b), py = 0.1415 and pp = 0.1667. The resulting confi- 
dence interval is py — pp + 1.96V PCL bo + bb) = 
—0.0491 to —0.0013. 


23.27: PLAN: Let pw be the proportion of whites who use 
social networking sites and py}; be the proportion of Hispanics who 
use them. We want to know if the proportions are different, so we 
will test Ho: pw = pu versus H,: pw * py. SOLVE: All sample 
counts are larger than 10 (the smallest is 43). fy = 3% = 


0.6502 and Br = 154 = 0.7208. B = i337 5 154 = 0.6575. 
— 0.6502 — 0.7208 1.75. P-value = 0.0801. 
jo 6575(1 — 06575) > 3 a) 


CONCLUDE: At the 0.05 level, we fail to reject Ho. This sur- 
vey has failed to find a difference in the proportions of whites 
and Hispanics who use social networking sites. 


23.29: PLAN: We construct a 99% confidence interval for 
pb, — pz, where p, denotes the proportion of people on Chan- 
tix who abstained from smoking, and p, is the corresponding 
proportion for the placebo population. SOLVE: the smallest 
count was 61, so the large-sample procedures are safe. 

b, = 3 = 0.4403, and b, = $4 = 0.1773. The 99% confi- 
dence interval is f; — fp, + 2. 576 Va bb + BO b) = 

0.2630 + 0.0864. CONCLUDE: We are 99% confident that 


the success rate for abstaining from smoking is between 17.7 
and 34.9 percentage points higher for smokers using Chantix 
than for smokers on a placebo. 


23.31: PLAN: Let p; be the proportion of people that will 
reject an unfair offer from another person, and p; be the pro- 
portion for offers from a computer. We test Ho: p; = p2 versus 
H,: pi > py. SOLVE: ae counts are greater than 5, so the 
conditions are met. f, = $$ = 0.4737 and fp) = & = 0.1579. 
fp = 38-738 = 0.3158, and SE = B(L — b)Gg + 38) = 

0.1066. x = 942% 5,015 — 7.96, P = 0.0015. CONCLUDE: 
There is very Sion evidence that people are more likely to 
reject an unfair offer from another person than from a computer. 


23.33: PLAN: Let p, and p, be (respectively) the proportions 
of mice ready to breed in good acorn years and bad acorn 
years. We give a 90% confidence interval for p; — p2. SOLVE: 
count is only 7, so we use the plus four method. 
= = 0.7432, and f, = +5 = 0.5789, pb: — fy + 
1 Ogee Bl + BOLE) = 0.1643 + 0.2042 = 0.0399 
to 0.3685. CONCLUDE: We are 90% confident that the pro- 
portion of mice ready to breed in good acorn years is between 
0.04 lower than and 0.37 higher than the proportion in bad 
acorn years. 


23.35: (a) This is an experiment because the researchers 
assigned subjects to the groups. (b) PLAN: Let p, and p, be 
(respectively) the proportions that have an RV infection for 
the HL+ group and control group. Ho: p; = p; versus 
H,: pi < p 2. SOLVE: We have large soba counts (49, 67, 


49, sue b= 49 pea = = 0. 4224, b= = +H — 47 — = 0, 5104, and 
p = +8 = 0.4623. SE = VA(1 — B)e + &) = 0.0688. 
= odi24 — 05104 — 1.28, P = 0.1003. CONCLUDE: We 


do not have enough evidence to reject the null hypothesis; 
there is little evidence to conclude that the proportion of 
HL+ users with a rhinovirus infection is less than that for 
non-HL+ users. 


Chapter 24 Inference about Variables: 

Part IV Review 

24.1: (c) ME = 2.056(9.3)/V27 = 3.7 

24.3: (b) t = 2.023, df = 13 

24.5: (d) Our estimate is f = 676/1760 = 0.384. 
(d) 


24.7: (d) The standard error is 


C554 — 0354)  O38KT— 0384) — 
B= Vee ee + Be ge = 0.0163. 


24.9: (a) The standard error is 0.0171. (b) A 95% confidence 
interval is 0.497 to 0.563. 


24.11: (b) 2.33 + 1.9840 using 100 df from Table C. 
24.13: 


(b) This value is from software. 


24.15: The standard deviations are larger than the 
means. Because PedMIDAS scores must be greater than or 
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equal to 0, the distributions must be right skewed. The 
sample sizes are fairly large (n = 64 and 71), so the sample 
means should be approximately Normal by the central limit 
theorem. 


24.17: (b) Use technology. 


24.19: (c) We would have to view these children as random 
samples from the larger population of children who could be in 
her class. 


24.21: (b) f = 225/757 = 0.297 


24.23: (b) 0.297 + 1.645(0.017) 


24.25: (c) It seems reasonable that the researchers suspect 
that VLBW babies are less likely to graduate from high 
school. 


24.27: (b) y= —027= a = -234 
\lie- Aaa as) 

24.29: (d)t= a = —1.25, and the test is two-sided. 

38" 54 

24.31: (b) 2 = WHC Sone 1.18, so P = 0.1190. 

24.33: p =#2 =0.76. The interval is 0.76 + 1.645 

Vee — 0.18) — 0.732 to 0.788. 

24.35: Ho: P2002 = Pro versus Hy: pr002 > pro12- P2002 = as a 


0.3687 and pro = fo = 0.3130, p = 72t+ $6, = 0.3371. 


0.3687 — 0.3130 = 4.02 and P = P(Z > 4.02), 
Jo 3371(1 - 0x7 + el 


which is close to 0. There is extremely strong evidence that 
the proportion of smokers in Alaska has decreased. How 
much of the decrease can be attributed to the campaign is 
uncertain. 


z= 


24.37: (a) Using technology, t = 0.392, df = 3.24. 


24.39: In all three cases, the observations must be able to be 
seen as random and representative samples of both types of 
diets. Also, the populations must be Normally distributed. 
Because the sample sizes are very small, this is almost impossi- 
ble to check with typical graphical methods. 


24.41: A large-sample (or plus four) confidence interval for 
a population proportion 


24.43: This is the entire population of Chicago Cubs players. 
Statistical inference is not appropriate. 


24.45: (a) two-sample test or confidence interval for differ- 
ence in proportions (b) two-sample test or confidence interval 
for difference in means (c) two-sample test or confidence 
interval for difference in proportions 
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24.47: (a) This is a matched pairs situation; the responses of 
each subject before and after treatment are not independent. 
(b) We need to know the standard deviation of the differences, 
not the two, individual, sample standard deviations. (Note 
that the mean difference is equal to the difference in the 
means, which is why we only need to know the standard devia- 
tion of the differences.) 


24.49: (a) fp = 80/80 = 1; the margin of error for 95% 
confidence (or any level of confidence) is 0 because 
“/H0 =) = 0, Almost certainly, if more trials were per- 
formed, a rat would eventually make a mistake, so the actual 
success rate is less than 1. (b) The plus four estimate is 
p = 82/84 = 0.9762, and the plus four 95% confidence inter- 
val is pte WD — 0.9762 + 0.0326 = 0.9436 to 
1.0088. Ignoring the upper limit, we are 95% confident that 
the actual success rate is 0.9436 or greater. 


24.51: PLAN: We test Ho: w = 12 versus Hy: w > 12, 
where yt denotes the mean age at first word, in months. 
SOLVE: We regard the sample as an SRS; a stemplot (not 
shown) shows that the data are right-skewed with a high 
outlier (26 months). If we proceed wh the t procedures 
despite this, we find x = 13 and s = 4.9311 months. 
t= Bute = 0.907, with df = 19, and P = 0.1879. If you 
delete the outlier mentioned above, x = 12.3158, s = 3.9729, 
and t = 0.346, yielding P = 0.3665; the conclusion will not 
change. CONCLUDE: We cannot conclude that the mean age 


at first word is greater than one year. 


24.53: PLAN: We give a 90% confidence interval for y, the 
mean age at first word, measured in months. SOLVE: See 
results from Exercise 24.51. For df = 19, t° = 1.729, so the 
90% confidence interval is 13 + 1.729 *3H = 11.09 to 14.91 
months. CONCLUDE: We are 90% sadn that the mean 
age at first word for normal children is between about 11 and 
15 months. 


24.55: (a) The design is shown. (b) PLAN: We test Hp: 
bp = Mc versus H,: ug ~ bc. SOLVE: We have xg = 41.2825 
and sp = 0.2550, and xc = 42.4925 and sco = 0.2939; 
ng = Nc = 8. SE = 0.1376, and t = @o-* = —8.79. With 
df = 7 (or 13.73 from software), P < 0. O01, Using software, 
with df = 13.73, P = 0.0000 to four places. There is over- 
whelming evidence that method B gives darker color on aver- 
age. However, the magnitude of this difference may be too 
small to be important in practice. 


Treatment 1 


Method B 
ao Compare 
Be color 
___, Treatment 2 


Method C 


Group 1 


8 pieces 
Random ee : 
assignment ~ 


Group 2 
8 pieces — 
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24.57: PLAN: We give a 95% confidence interval for p, the 
proportion of female students with at least one parent who 
allows drinking. SOLVE: We are told that the sample represents 
an SRS. Large sample methods may be used because the num- 
ber of successes and failures are both greater than 15. With 
p = 65/94 = 0.6915, we have SE = 0.04764, so the margin 
of error is 1.96 SE = 0.09337, and the interval is 0.5981 to 
0.7849. CONCLUDE: With 95% confidence, the proportion 
of female students who have at least one parent who allows 


drinking is 0.598 to 0.785. 


24.59: (a) PLAN: We want to compare the proportions 
pi (microwaved crackers that show checking) and p, (con- 
trol crackers that show checking). We can do this either by 
testing hypotheses or with a confidence interval, but because 
the “microwave checked” count is only 3, significance tests 
are not appropriate. We will use the plus four procedure 
and construct a confidence interval for p; — p). SOLVE: 
We find that p; = 444 = 0.0597 and p, = 444 = 0.8657. 
SE = Vata) + aC — be) = 0.05073, and a 95% confi- 
dence interval is given by p; — py = 1.96(0.0507) = —0.9054 
to —0.7066. CONCLUDE: We are 95% confident that micro- 
waving reduces the percentage of checked crackers by between 
70.7% and 90.5%. (b) PLAN: We want to compare 1; and jy, 
the mean breaking pressures of microwaved and control crack- 
ers. We test Ho: a, = pl versus H,: uy # fy and construct a 
95% confidence interval for wu, — f2. SOLVE: We assume the 
data can be considered SRSs from the two populations, and 
that the population distributions are not far from Normal. 
Now, SE = 9.0546 and t = 6.914, so the P-value is very small, 
regardless of whether we use df = 19 or df = 33.27. A 95% 
confidence interval for the difference in mean breaking pres- 
sures between these cracker types is 43.65 to 81.55 psi (using 
df = 19 and t° = 2.093), or 44.18 to 81.02 psi (using df = 
33.27 and t° = 2.0339). CONCLUDE: There is very strong 
evidence that microwaving crackers changes their mean 
breaking strength. We are 95% confident that microwaving 
crackers increases their mean breaking strength by between 


43.65 and 81.55 psi. 


24.61: Two of the counts are too small to perform a signifi- 
cance test safely. 


24.63: The group means are x, = 5.9 (insulin), x, = 0.75 
(glucose) ng/mL, and the standard deviations are s; = 
0.9VI10 = 2.85 and s, = 0.2V10 = 0.632 ng/mL. PLAN: 
We test Ho: mw, = pf versus H,: 4) ~ M2. SOLVE: The esti- 
mated standard error of the difference in sample means is 
SE = V0.9 + 0.2? = 0.9220, so t = 224,22 = 5.59. With 
either df = 9 or df = 9.89, P < 0.001. CONCLUDE: The 


evidence is even stronger than the paper claimed. 


24.65: The sample proportion is p = 594/1484 = 0.400. 
The 95% confidence interval for those who believe 
humans developed from earlier species of animals is 0.40 + 


1.96-V 24088) 0.375 to 0.425. 


Chapter 25 Two Categorical Variables: 
The Chi-Square Test 


25.1: (a) The table provided gives percentages in each cate- 
gory. As an example, there were a total of 1308 surveyed Cau- 
casians. Of these, 181 were under 25; the proportion of 
Caucasians surveyed who were under 25 is 181/1308 = 0.1384, 
which is represented as 13.8% in the table. 


CAUCASIAN HISPANIC 
Under 25 13.8% 24.9% 
25 to 34 24.8% 37.9% 
35 to 44 17.7% 23.3% 
45 and over 43.7% 13.9% 


Monterey Bay Aquarium Visitors (Surveyed) 


£ 
(2) 
\ 


w 
jo} 
i 


Percent of Ethnic Group 
is) 
l 


BR 
) 
1 


0-4 


AgeGrp Under 25to 35to Over 


Under 25 to 35to Over 
Ethnicity 25 34 44 45 25 34 44 45 


Hispanic Caucasian 


(b) The bar graph clearly reveals that Hispanic visitors tend to 
be younger; over 40% of Caucasian visitors are older than 45. 


25.3: (a) To test Ho: p; = p2 versus H,: p; % p> for the propor- 
tions of boys who did and did not play video games, we have 
fp; = 0.326 and ph, = 0.336. The pooled proportion is 
pb = toto bs = 0.3285, and SE = 0.02596, so z = 0.39, for 
which P = 0.6966. We conclude that we have no evidence to say 
the proportion of C students is different for boys who have (and 
have not) played video games. (b) To test Ho: p; = pz versus 
H,: p; % pz for the proportions who earn Ds and Fs, we have 
fb; = 0.1400 and pf, = 0.1865. The pooled proportion is 
p = 0.1510, and SE = 0.01979, so z = —2.35, for which P = 
0.0188. There is strong evidence that playing video games changes 
the proportion of boys who would earn D or F grade averages. (c) 
If we did three individual tests, we would not know how confident 
we could be in all three results when taken together. 


25.5: (a) Expected counts are below observed counts in the 
table provided. For example, for Hispanics under 25, the expected 
count is 2%)G!D = 50.72. (b) We find that the observed counts 
are much greater than the expected counts for all three younger 
age groups of Hispanics. The actual count of Hispanics over 45 
(44) is much smaller than the expected count (120). 
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Caucasian Hispanic All Not at 

all A little Somewhat Very Extremely All 

Under25 181 79 260 
209.3 50.7 20-29 8 29 28 13 0 78 
10.26 37.18 35.90 16.67 0.00 100.00 
25t034 324 120 444 5.73 17.57. 31.40 17.57 5.73 78.00 
357.4 86.6 0.8977 7.4379 0.3680 1.1881 5.7316 * 
35to44 232 7A 306 30-39 15 28 55 23 9 130 
ae a 11.54 21.54 42.31 17.69 6.92 100.00 
: : 9.55 29.28 52.33 29.28 9.55 130.00 
over45 571 AA 615 3.1062 0.0561 0.1360 1.3474 0.0320 * 
495.0 120.0 40-49 2 25 49 26 14-116 
1.72 21.55 42.24 22.41 12.07 100.00 
All 1308 317 1625 8.52 26.13 46.70 26.13 8.52 116.00 
4.9932 0.0487 0.1136 0.0006 3.5180 * 
25.7: (a) All expected counts are well above 5 (the smallest 50 and older 21 59 120 79 23. 302 
is 50.72). (b) We test Hy: there is no relationship between Brey Tee aaa dA 20nt8 bnee 100800 
f : e 22.19 68.02 121.57 68.02 22.19 302.00 
age and ethnic group for Monterey Bay Aquarium visitors ver- 0.0640 4.1967 0.0203 1.7716 6.0294 : 
sus H,: there is a relationship between age and ethnic group. jy, 46 iat eer ie. <éh6 
From the SAS output, we have x’ = 99.6058 and P < 0.0001. 7.35 22.52 40.26 22.52 7.35 100.00 
(c) The largest contributions generally come from the Hispan- mene Cee See eee ore eee 

ics, reflecting that group tends to visit the aquarium at younger 

Cell Contents: Count 
ages. % of Row 


25.9: PLAN: We will carry out a chi-square test for asso- 
ciation between education level and opinion about astrol- 
ogy. We test Ho: there is no relationship between education 
level and astrology opinion versus H,;: there is some relation- 
ship between education level and astrology opinion. SOLVE: 
Examining the output provided in Figure 25.5, we see that 
all expected cell counts are greater than 5 (the smallest is 
25.21), so conditions for use of the chi-square test are satis- 
fied. We see that x? = 3.618 and P = 0.16. CONCLUDE: 
There no evidence of an association between education 
level and opinion of astrology. Examining the table, we note 
that for people with graduate degrees, more than expected 
felt that astrology is not scientific. For people with a junior 
college degree, more than expected believed that astrology is 
scientific; these differences were not large enough to believe 
that the differences were due to something other than sam- 
pling variability. 


25.11: (a) Because there was one sample that was later cate- 
gorized by two variables, this is a test of independence. (b) 
PLAN: We will test Ho: there is no relationship between age and 
how politically informed the person is versus H,;: there is a rela- 
tionship between age and how politically informed the person is. 
SOLVE: Minitab output is provided. We see that all expected 
cell counts are greater than 5 (the smallest is 5.73), so condi- 
tions are satisfied. y? = 32.057 on df = 12, and P = 0.001 to 
three decimal places. CONCLUDE: Age and being informed 
about politics are related. In particular, we can see that people 
20 to 29 years old are less likely to be informed (observed counts 
are larger than expected for the cells with little awareness and 
smaller than expected for cells with much awareness). The other 
large contribution to the chi-square statistic came from people 
40 to 49 years old: fewer than expected said they were not at all 
informed. 


Expected count 
Contribution to Chi-square 


Pearson Chi-Square = 32.057, DF = 12, 


25.13: (a) df = (@r — Die -) =(2 -HB- 1 =2 
(b) The computed value (6.739) is between the table values 
5.99 and 7.38; we conclude that 0.025 < P < 0.05, which is 
consistent with output’s reported P = 0.034. (c) Under the 
null hypothesis of no association, the mean value of x? is 
df = 2. Our computed value is larger than this. The small 
P-value suggests that random chance does not easily explain 
the larger than expected value of x’. 


P-Value = 0.001 


25.15: (a) If all days were equally likely, we would have 
pi=pr= ... =p 7 and we would expect 100 births on 
each day. (b) The chi-square statistic is then X’? = 


84 — 100)? , (110 — 100) 72. — 100)? 
im Eat im ; + + P00 = 19.12. (c) df = 


7 — 1 = 6. From Table D, yx”? = 19.12 yields 0.0025 < P < 
0.005. Software gives P = 0.004. We have strong evidence that 
births are not spread evenly across the week. 


25.17: (a) 15/33 = 0.455, or 45.5% of subjects chose position 
1. Similarly, the percentages for each position are 15.2% for posi- 
tion 2, 6.1% for position 3, and 33.3% for position 4. (b) If subjects 
are equally likely to select any position, then we would expect 
33/4 = 8.25 subjects in each position. (c) PLAN: We test 
Ho: pi = po = p3 = ps = i versus H,: the four order selection 
probabilities are not equally likely. SOLVE: As computed above, 
we expect 8.25 subjects per cell under the null hypothesis, so all 
expected cell counts exceed 5. Also, we have at least one observa- 
tion per cell. Conditions for the chi-square test are satisfied. Now, 


15 — 8.25 , (5 - 825 , (2-825) , (11 — 8.25) 
= Fee + Bs ee ee = 1. 


df = 4 — | =3. From Table D, x’ = 12.45 yields 0.005 <P < 
0.01 (P = 0.006 from technology). CONCLUDE: There is 
strong evidence that positions are not selected with equal 
probability—some positions are more likely to be selected than 
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others. (d) We see that the largest contributions to x’ are from 
the first and third positions. In the first and fourth cases, we 
have more observations than expected, and in the second and 
third positions we have fewer observations than expected. 
There is evidence of both primacy and recency effects. 


25.19: (a) 295 + 655 + 239 + 363 = 1552 


25.21: (a) For those 23 to 30 years old, the percentage is 
26.5%. 


25.23: (a) The expected cell count is (1571) (1552)/4111 = 
593.09. 


25.25: (a) df=(r -—I(c-lH)=4-1(2-1) =3 
25.27: (b) This is the hypothesis of association between 
“age” and “type of injury.” 


25.29: (b) We assume that the sample is an SRS, or essen- 
tially an SRS from all weightlifting injuries. 


25.31: (a) These were separate random samples, so this is a 
test of homogeneity. (b) PLAN: We test H,: the distribution of 
age groups is the same for landline and cell-only individuals 
versus H,;: the distributions are different. SOLVE: All expected 
cell counts are more than 5, so the guidelines for the chi- 
square test are satisfied. We have yx? = 1032.892, df = 3, and 
P < 0.0005. CONCLUDE: There is strong evidence of an 
association between age group and the type of telephone. In 
fact, the younger age groups were much more likely than 
expected to have only cellphones. 


Cell Only Landline All 
Age18-29 374 335) 709 
52.75 47.25 100.00 
115 594 709 
587.04 113.20 * 
Age30-49 347 1242 1589 
21.84 78.16 166.680 
257 1332 1589 
31.63 6.10 * 
Age50-64 146 1625 1771 
8.24 91.76 100.00 
286 1485 2772 
68.75 13.26 * 
Age65up 36 1481 1517 
2.37 97.63 100.00 
245 1272 1517 
178.51 34.42 * 
All 903 4683 5586 
16.17 83.83 100.00 
903 4683 5586 
* * * 
Cell Contents: Count 
% of Row 


Expected count 
Contribution to Chi-square 


Pearson Chi-Square = 1032.892, DF = 3, P-Value = 0.000 


25.33: (a) The diagram is shown. To perform the randomiza- 
tion, label the infants 01 to 77, and choose pairs of random 
digits. (b) See the Minitab output for the two-way table. We 
find x’ = 0.568, df = 3, and P = 0.904. There is no reason to 
doubt that the randomization “worked.” 


Female Male 
NLCP 11 8 
10.36 8.64 
0.03907 0.04689 
PBM Ad 9 
10.91 9.09 
0.00076 0.00091 
PL-LCP 11 8 
10.36 8.64 
0.03907 0.04689 
TG-LCP 9 10 
10.36 8.64 
0.17943 0.21531 

Cell Contents: Count 


Expected count 
Contribution to Chi-square 


P-Value = 0.904 
P-Value = 0.904 


Pearson Chi-Square = 0.568, DF = 3, 
Likelihood Ratio Chi-Square = 0.567, DF = 3, 


Group 1 Treatment 1 
20 infants PBM 


Group 2 Treatment 2 \ 


19 infants NLCP 
Random 17 bee Observe 
assignment development 
Ss Gro a 


up 3 Treatment 3 
19 infants —° PL-LCP ye 


Group 4 Treatment 4 
19 infants TG-LCP 


25.35: (a) The two-way table follows. We test Ho: p; = p2 
versus H,: pj < p2. (b) The zx test must be used because the 
chi-square procedure measures evidence in support of any 
association and is implicitly two-sided. We have f, = 0.3667 
and fp; = 0.7333. The pooled proportion is f = +45 = 0.55, 
and the standard error is SE = 0.12845, sox = —2.85 and P = 
0.0022. We have strong evidence that rats that can stop the 
shock (and therefore presumably have better attitudes) 
develop tumors less often than rats that cannot (and therefore 
are presumably depressed). 


TUMOR NO TUMOR 
Group | 11 19 
Group 2 22 8 


25.37: PLAN: We test Ho: there is no relationship between 
sexual content of ads and magazine audience versus H,: 
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there is some relationship between sexual content of ads and 
magazine audience. SOLVE: Examining the Minitab output 
in Figure 25.9, we see that conditions for use of the chi- 
square test are satisfied because all expected cell counts 
exceed 5 (the smallest is 82.4). We have x? = 80.874 with 
df = 2, leading to P < 0.0005. CONCLUDE: Magazines 
aimed at women are much more likely to have sexual depic- 
tions of models than the other two types of magazines. Spe- 
cifically, about 39% of ads in women’s magazines show sexual 
depictions of models, compared with 21% and 17% of ads in 
general-audience and men’s magazines, respectively. The two 
women’s chi-squared terms account for over half of the total 
chi-square value. 


25.39: We need cell counts, not just percentages. If we had 
been given the number of travelers in each group—leisure and 
business—we could have estimated this. 


25.41: In order to doa chi-square test, each subject can only 
be counted once. In this experiment, each individual is repre- 
sented for both treatments (carob and chocolate). 


25.43: (a) We test Ho: there is no relationship between 
degree held and service attendance versus H,: there is some 
relationship between degree held and service attendance. 
Examining the Minitab output, y? = 14.19 with df = 3, 
P-value = 0.003. There is strong evidence of an association 
between degree held and service attendance. 


HS JColl Bachelor Graduate All 
No 880 101 232 105 1318 
842.7 107.3 248.9 119.2 1318.0 
Yes 400 62 146 76 684 
437.3 55).7 129.1 61.8 684.0 

Cell Contents: Count 


Expected count 

Pearson Chi-Square = 14.190, DF = 3, P-Value = 0.003 
(b) The new table is shown. We find xy’? = 0.73 on 
df = 2. P = 0.694. In this table, we find no evidence of asso- 


ciation between religious service attendance and degree 


held. 


JColl Bachelor Graduate All 
No 101 232 105 438 
98.9 229.3 109.8 438.0 
Yes 62 146 76 284 
64.1 148.7 Tie2 284.0 

Cell Contents: Count 


Expected count 


Pearson Chi-Square = 0.729, DF = 2, P-Value = 0.694 


(c) The new table is shown. We have x? = 13.40 with df = 1. 
P = 0.000 to three decimal places (it’s actually 0.0002). There 
is overwhelming evidence of association between level of edu- 
cation (high school versus beyond high school) and religious 
service attendance. 
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BeyondHSs HSchool All 
Attend 284 400 684 
246.7 437.3 684.0 
NoAttend 438 880 1318 
475.3 842.7 1318.0 

Cell Contents: Count 


Expected count 


Pearson Chi-Square = 13.416, DF = 1, P-Value = 0.000 


(d) In general, we find that people with degrees beyond high 
school attend services more often than expected; people with 
high school degrees attend services less often than expected. 
Of those with high school degrees, 31.3% attended services, 
and the percentages are 38.0%, 38.6%, and 42.0%, respec- 
tively, for people with junior college, bachelor’s, and graduate 
degrees. 


25.45: PLAN: We use a chi-square test to test Ho: there is 
no relationship between race and opinion about schools ver- 
sus H,: there is some relationship between race and opinion 
about schools. SOLVE: All expected cell counts exceed 5 
(the smallest is 21.26), so use of a chi-square test is appropri- 
ate. We find that x? = 22.426 with df = 8, and P = 0.004. 
Nearly half of the total chi-square comes from the first two 
terms; most of the rest comes from the second and fifth rows. 
CONCLUDE: We have strong evidence of a relationship 
between race and opinion of schools. Specifically, according 
to the sample (as illustrated in the table), blacks are less 
likely and Hispanics are more likely to consider schools to be 
excellent, while Hispanics and whites differ in the percent 
considering schools good. Also, a higher percentage of blacks 
rated schools as “fair.” 


Black Other White All 
Exclnt 12 22 34 68 
22.70 22.59 22.570 68.00 
Good 69 81 55 205 
68.45 68.11 68.45 205.00 
Fair 75 60 61 196 
65.44 65.12 65.44 196.00 
Poor 24 24 24 72 
24.04 23.92 24.04 72.00 
Don’t Know 22 14 28 64 


21.37 21.26 21537 64.00 


Count 
Expected count 


Cell Contents: 


Pearson Chi-Square = 22.426, DF = 8, P-Value = 0.004 


25.47: PLAN: We compare how the number of children 
per group has changed from 2009 through 2013 at the Mon- 
terey Bay Aquarium. We will create a bar graph and do a 
chi-square test of homogeneity (each year is a separate sam- 
ple). SOLVE: To examine any possible change in the number 
of children per group, we first look at a segmented bar graph 
of the data. The graph indicates that the number of groups 
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with no children has been steadily decreasing and the num- 
ber of groups with three or more children has been increasing 
slightly. There is some fluctuation in the number of groups 
with one or two children, but 2013 definitely has more of 
these groups than 2009. 


Monterey Bay Aquarium Kids per Group 


100 + 
80 4 
60 4 
40 4 
20 4 

0 


2009 2010 2011 2012 2013 
Year 


Percent 


NumkKids 
BM 3ormore M2 


| Ho 


First, we note that all expected counts are above 5 (the small- 
est is 49.3). The test statistic is vy? = 75.248 with df = 12. 
With P-value < 0.0005, we reject the null hypothesis that 
there has been no change in the distribution of the number of 
children per group. It seems clear (especially when comparing 
2009 to 2013) that they are meeting their goal of attracting a 
younger audience. 


Rows: NumKids Columns: Year 
2009 2010 2011 2012 2013 All 
0 390 330 342 318 268 1648 
327.6 316.7 327.6 331.5 344.6 1648.0 
aL 7B) 94 93 99 131 496 
98.6 95:..3 98.6 99.8 103.7 496.0 
2 66 96 96 111 137 506 
100.6 97.3 100.6 101.8 105.8 506.0 
3 or more 41 37 45 55 70 248 
49.3 47.7 49.3 49.9 51.9) 248.0 
All 576 557 576 583 606 2898 
576.0 557.0 576.0 583.0 606.0 2898.0 
Cell Contents: Count 


Expected count 


Pearson Chi-Square = 75.248, DF = 12, P-Value = 0.000 


25.49: PLAN: We compare the percentages leaning toward 
each party within each education group. SOLVE: The 


requested table is provided. At each education level, we 
compute the percentage leaning toward each party. For example, 
among bachelor’s degree holders, 166/(166 + 136) = 54.97% 
lean Democrat, while the other 45.03% lean Republican. 


Bachelor Graduate Highschool JrCollege None All 


Dem 166 119 454 68 127 934 
54.97 69.19 59.66 57.14 70.95 60.93 
Rep 136 53 307 51 52 599 
45.03 30.81 40.34 42.86 29.05 39.07 
Cell Contents: Count 


% of Column 


CONCLUDE: At every education level, people leaning Dem- 
ocrat outweigh people leaning Republican. The difference is 
greatest at the “None” level of education, then decreases until 
the party support is nearly equal for bachelor’s degree holders. 
Among graduate degree holders, Democrats strongly outnum- 
ber Republicans. 


Chapter 26 _ Inference for Regression 


26.1: (a) A scatterplot of the data is provided, along with the 
least-squares regression line. From software, r = 0.985. 


Is Wine Good for Women's Health? 
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Wine (g per day) 


(b) If we knew it, the slope B would tell us how much the rela- 
tive risk of breast cancer changes in women for each increase 
of 1 gram of wine per day (on average). We estimate that an 
increase in intake of 1 gram per day increases relative risk of 
breast cancer by about 0.009. According to our estimate, wine 
intake of 0 grams per day is associated with a relative risk of 
breast cancer of 0.9931 (about 1). (c) y = 0.9931 + 0.0090x. 
s = V9? = 0.01987. 


RESIDUAL 
x ¥ y yo7 mea 
23) 1.00 1.0156 —0.0156 0.00024 
8.5 1.08 1.0697 0.0103 0.00011 
15.5 1.15 1.1328 0.0172 0.00030 
26.5 1.22 1.2319 —0.0119 0.00014 
0 0.00079 


26.3: (a) A scatterplot of discharge by year is provided, 
along with the fitted regression line. Discharge seems to be 
increasing over time, but there is also a lot of variation in this 
trend, and our impression is easily influenced by the most 
recent years’ data. r’ = 0.215, so the least-squares regression 
line explains 21.5% of the total observed variability in Arctic 
discharge. (b) y = —3362 + 2.6327x, s = 110.477. 


Arctic Rivers Discharge 
2300 4 


2200 4 
2100 + e 
2000 - 
900 - 
1800 + 


1700 + 


Water Discharge (cubic km) 


600 4 


500 4 
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26.5: Hy: 8 =O versus H,:8 > 0. We compute 
t= oe = (34 = 4.47. df =n —2 = 75 — 2 = 73. In tefer- 
ring to Table C, we round df down to df = 60; P < 0.0005. 
Using software, P = 0.000 There is strong evidence of an 
increase in Arctic discharge over time. 


26.7: (a) Ho: B =0 versus H,:B > 0, t = 8.104 with 
df = 2 (all from software). 0.005 < P < 0.01. This test is 
equivalent to testing Ho: population correlation = 0 versus 
H,: population correlation > 0. (b) r = 0.985. From Table E 
with n = 4, 0.005 <P < 0.01. 


26.9: t* = 2.920 (df = 4 — 2 = 2, with 90% confidence). 
0.009012 + 2.920(0.001112) = 0.009012 + 0.003247 = 
0.00577 to 0.01226. With 90% confidence, the expected 
increase in relative risk of breast cancer associated with an 
increase in alcohol consumption by 1 gram per day is between 


0.00577 and 0.01226. 


26.11: b = 2.6327 and SE, = 0.5893. With n = 75, df = 73. 
Using Table C, use df = 60. t* = 1.671 (t* = 1.666 from 
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technology). 2.6327 + 1.671(0.5893) = 2.6327 + 0.9847 = 
1.6480 to 3.6174 cubic kilometers per year (technology: 1.6509 
to 3.6145). With 90% confidence, the yearly increase in Arctic 
discharge is between 1.6480 and 3.6174 cubic kilometers. This 
confidence interval excludes “0,” so there is evidence that 
Arctic discharge is increasing over time. 


26.13: (a) If x* = 0.65, w = 10.0655 + 86.0308 (0.65) = 
65.98552. (b) SE, = 1.47. For df = 29 — 2 = 27 and 95% 
confidence, t* = 2.052. A 95% confidence interval for mean 
brain gray matter volume in people with 0.65 AROC is given 
by 65.98552 + 2.052(1.469) = 62.971 to 69.000. 


95% CI 
(62.9714, 68.9997) 


Fit SE Fit 
65.9855 1.46902 


95% PI 
(49.4805, 82.4906) 


26.15: (a) The residual plot provided does not suggest any 
deviation from a straight-line relationship between brain vol- 
ume and Aroc score, although there are two large (in absolute 
value) residuals near the left end of the plot. Both had AROC 
scores of about 0.63, but one had his/her volume underpre- 
dicted (the positive residual) and one was overpredicted (the 
negative residual). 


25 7 
20 + 


Volume Residual 
e 
e 
e 
e 
e 


T T 1 
O55 0.60 0.65 0.70 0.75 
Aroc 


(b) A stemplot of residuals, provided, does not suggest that the 
distribution of residuals departs strongly from Normality. The 
value 22 from observation 5 may be an outlier; other than that, 
the residuals are symmetric and mound-shaped. 


4 Stem and Leaf 


Stem | Leaf 
ial 1 
ul 

2 1 
0 | 889 3 
0 | 222333334 9 
-0 | 44432110 8 
-0 | 88777 5 
-1]0 1 
-1/8 a 


-1|8 represents -18 
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(c) It is reasonable to assume that observations are indepen- 
dent because we have 29 different subjects, measured sepa- 
rately. (d) Other than the large residuals noted in part (a), 
there is no indication that variability changes; there are fewer 
individuals with low Aroc scores, so there is naturally less vari- 
ability on the left end of the plot. 


26.17: (a) With a positive association, r= +Vr? = 
+V0.579 = 0.761. 


26.19: (a) This is a one-sided alternative because we 
wonder if larger appraisal values are associated with larger 
selling prices. 


26.21: (c)s = 221.341 


26.23: (b) With 45 degrees of freedom, t° = 2.014, so the 
margin of error is 2.014(0.1540) = 0.3102. Using Table C and 
40 df, t* = 2.021 and ME = 0.3112. 


26.25: (a) Scientists estimate that each additional 1% 
increase in the percentage of Bt cotton plants results in an 
average increase of 6.81 mirid bugs per 100 plants. (b) The 
regression model explains 90% of the variability in mirid bug 
density. (c) Hy: 8 =0 versus H,: 8B > 0 (H): population 
correlation = 0 versus H,: population correlation > 0). P < 
0.0001; there is strong evidence of a positive linear relation- 
ship between the proportion of Bt cotton plants and the den- 
sity of mirid bugs. (d) We cannot conclude a causal relationship; 
this was not a designed experiment. 


26.27: For 90% intervals with df = 10, t = 1.812. 
(a) 274.78 + 1.812(88.18) = 274.78 * 159.78 = 115.0 to 
434.6 fps/inch. (b) This is the “90% CI” given in Figure 26.13: 
176.2 to 239.4 fps. To confirm this, y = 207.8 and SE, = 17.4. 
y + t° SE, = 207.8 + 1.812(17.4) = 176.3 to 239.3 fps, 
which agrees with the output up to rounding error. 


26.29: (a) There is little evidence of non-Normality in 
the residuals. (b) The scatterplot confirms the comments 
in the text: There is no clear pattern, but the variability 
about the “residual = 0” line may be slightly greater when 
x is larger. (c) Presumably, close inspection of a manatee’s 
corpse will reveal nonsubtle clues when cause of death is 
from collision with a boat propeller. It seems reasonable 
that the kills listed in the table are mostly not caused by 
pollution. 


-1|86 
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Manatee Residuals vs. Boats 
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26.31: (a) With df = 35 (rounding df down to 30), t* = 
2.042. 0.132259 + 2.042(0.007067) = 0.132259 + 0.014431 = 
0.11783 to 0.14669 additional killed manatees per 1000 addi- 
tional boats. (Using technology, 0.11791 to 0.14661.) (b) 
J = —44.831 + 0.132259(900) = 74.2021, which agrees 
with Figure 26.14 under “Fit.” We need the prediction interval 
because we are forecasting the number of manatees killed for a 
single year. A 95% prediction interval for the number of killed 
manatees is 57.74 to 90.67 kills if 900,000 boats are registered. 


26.33: (a) Ho: Population correlation = 0 against H,: Popu- 
lation correlation > 0; t = 4.06 with df = 30 — 2 = 28, 
P = 0.0004. There is very strong evidence of a positive correla- 
tion between Gray’s forecasted number of storms and the 
number of storms that actually occur. (b) 2 = 1.6681676 + 
0.9199861(16) = 16.388, and JMP gives the 95% confidence 
interval for the mean as 14.077 to 18.699. 


26.35: The stemplot suggests that the residuals may not fol- 
low a Normal distribution. Specifically, there is a low outlier 
that seems extreme. This makes regression inference and 
interval procedures unreliable. 


26.37: (a) Shown is the scatterplot with two (nearly identi- 
cal) regression lines. (b) For all points, r = 0.8486. t = 6.00, 
P < 0.0005. (c) Without the outlier, r = 0.7014, t = 3.55, and 
P = 0.004. In both cases there is strong evidence of a linear 
relationship between neural loss aversion and behavioral loss 
aversion. Omitting the outlier weakens this evidence somewhat. 


Brains Don't Like Losses 
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26.39: The distribution is skewed right, but the sample is 
large, so t procedures should be safe. xX = 0.2781 g/m’ ands = 
0.1803 g/m’. Table C gives t* = 1.984 for df = 100 (rounded 
down from 115). 0.2871 + (1.981)(0.183/V116) = 0.2449 
to 0.3113 g/m’. 


26.41: PLAN: We examine the relationship between pine 
cone abundance and squirrel density using a scatterplot and 
regression. SOLVE: A scatterplot indicates a positive relation- 
ship that is roughly linear with what appears to be an outlier at 
the upper right of the graph. Regression gives predicted squir- 
rel density as y = 0.961 + 0.205x. The slope is significantly 
different from zero (t = 3.13, P = 0.005). For one-sided alter- 
native H,: B > 0, P = 0.0025. Conditions for inference seem 
to be violated. The residual plot shows what appears to be 
increasing variability with increasing cone values. The stem- 
plot of the residuals indicates two large positive outliers; the 
distribution may be right-skewed. CONCLUDE: We seem to 
have strong evidence of a positive linear relationship between 
cone abundance and squirrel density; however, conditions for 
inference may not be satisfied. 


Squirrel Density Residuals vs. Pine Cones 


Residual 


Cones 


26.43: PLAN: We will examine the relationship between 
beaver stumps and beetle larvae using a scatterplot and regres- 
sion. We specifically test for a positive slope 8, and find a con- 
fidence interval for 8. SOLVE: The scatterplot shows a positive 
linear association; y = —1.286 + 11.894x. A stemplot of the 
residuals does not suggest non-Normality of the residuals, the 
residual plot does not suggest nonlinearity, and the problem 
description makes clear that observations are independent. For 
Ho: B = Oversus H,: B > 0,t = 10.47 (df = 21), P < 0.0005. 
t* = 2.080 for 95% confidence. We are 95% confident that 
B is between 11.894 + (2.080) (1.136) = 9.531 and 14.257. 
CONCLUDE: We have strong evidence that beetle larvae 
counts increase with beaver stump counts. Specifically, we are 
95% confident that each additional stump is (on average) 
accompanied by between 9.5 and 14.3 additional larvae clusters. 


26.45: PLAN: Using a scatterplot and regression, we exam- 
ine how well phytopigment concentration explains DNA con- 
centration. SOLVE: The scatterplot shows a fairly strong, 
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linear, positive association; y = 0.1523 + 8.1676x. A stem- 
plot of the residuals looks reasonably Normal, but the scatter- 
plot suggests that the variability about the line is greater when 
phytopigment concentration is greater. This may make regres- 
sion inference unreliable. Finally, observations are indepen- 
dent, from the context of the problem. The slope is significantly 
different from 0 (t = 13.25, df = 114, P < 0.001). A 95% 
confidence interval for B: 8.1676 + 1.984(0.6163) = 6.95 to 
9.39. CONCLUDE: The significant linear relationship 
between phytopigment and DNA concentrations is consistent 
with the belief that organic matter settling is a primary source 
of DNA. 


26.47: (a) x = —0.00333, ands = 1.0233. For a standard- 
ized set of values, we expect the mean and standard deviation 
to be O and 1, respectively. (b) The stemplot does not look 
particularly symmetric, but it is not strikingly non-Normal for 
such a small sample. (c) The probability is about 0.0272. 


26.49: For df = 14 and a 95% confidence interval, t* = 
2.145, the interval is —0.01270 + 2.145(0.01264) = —0.0398 
to 0.0144. This interval does contain 0. 


Chapter 27 One-Way Analysis of Variance: 
Comparing Several Means 


27.1: (a) The null hypothesis is “all age groups have the same 
population mean road-rage measurement,” and the alternative 
is “at least one group has a different mean.” (b) The F test is 
quite significant, giving strong evidence that the means are dif- 
ferent. The sample means suggest that the degree of road rage 
decreases with age. (We assume that higher numbers indicate 
more road rage.) 


27.3: (a) The stemplots suggest that logging reduces the 
number of trees per plot and that recovery is slow (the one- 
year-after and eight-years-after logging stemplots are similar). 
(b) The means lead one to the same conclusion as in part (a): 
The first mean (23.75) is much larger than the other two 
(14.08 and 15.78). (c) In testing Ho: w; = 2 = M3 versus H,: 
not all means are the same, we find that F = 11.43 with df = 
2 and 30, which has P = 0.0002, so we conclude that these 
differences are significant: The mean number of trees per plot 
is significantly lower in logged areas. 


Never logged 1 year ago & years ago 
O oO |2 O \4 

O Oo |9 (0) 

1 1 2244 1 22 

1 |699 1 157789 1 |5889 
2 |0124 2 |0 2 |22 
217789 2 2 

3 15 3 3 


27.5: (a) Answers will vary due to randomness. (b) By mov- 
ing the middle mean to the same level as the other two, it is 
possible to reduce F to about 0.02, which has a P-value very 
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close to the left end of the scale (near 1). (c) By moving any 
mean up or down (or any two means in opposite directions), 
the value of F increases (and P decreases) until it moves to the 
right end of the scale. 


27.7: (a) We have sj = 25.6591, si = 24.8106, and 
sj = 33.1944, so s, = 5.065, s) = 4.981 and s; = 5.761. The 
ratio of largest to smallest standard deviation is 5.761/4.981 = 
1.16, which is less than 2. Conditions are satisfied. (b) The 
three standard deviations are s, = 17.41, sy = 18.13 and 
sc = 17.42. The ratio of largest to smallest standard deviation 
is 18.13/17.41 = 1.04, which is less than 2. Conditions are 
satisfied. 


27.9: PLAN: Examine the data to compare the effect of the 
treatments and check that we can safely use ANOVA. If the 
data allow ANOVA, assess the significance of observed differ- 
ences in mean numbers of new leaves. SOLVE: Side-by-side 
stemplots shows some irregularity but no outliers or strong 
skewness. The Minitab ANOVA output shows that the group 
standard deviations easily satisfy our rule of thumb (2.059/1.302 
= 1.58 < 2). The differences among the groups were signifi- 
cant ata = 0.05: F = 3.44, df = 3 and 27, P = 0.031. CON- 
CLUDE: Nitrogen had a positive effect, the phosphorus and 
control groups were similar, and the plants that got both nutri- 
ents fell between the others. 


Control Nitrogen Phosphorus Both 
11 | OO "1 1|0 "1 
12|0O 12 12|0O 12 
13 | O 13 | O 13 | O 13 | O 
14 14/0 14 | OOO 14 | 0000 
15 | OO 15 | OO 15 | OO 15/0 
16|0 16|0 16 16|0 
17 17 | OO 17 17| 0 
18 1610 18 18 
Minitab output 
Source DF Ss MS F P 
Sleep 3 27.21 9.07 3.44 0.031 
Error 27. ys es Et) 2.64 
Total 30 98.39 
Individual 95% CIs For Mean 
Based on Pooled StDev 
Level N Mean StDev -------- HeSSsSasS +-------- +$-------- 
C 7 13.286 2.059 (----- Pann ) 
N 8 15.625 1.685 (----- Fo---- ) 
P 8 13.500 1.414 (----- *----- ) 
NP 8 14.625 1.302 (----- *#a---- ) 
-------- 4+--------4+--------+-------- 
Pooled StDev = 1.624 13)35) 1550 16.5 


27.11: (a) 1 =3andN = 96, so df = 2 and 93. (b) I = 3 and 
N = 90, so df = 2 and 87. 


27.13: (a) No sample standard deviation is larger than 
twice any other. Specifically, the ratio of largest to smallest 


standard deviation is 2.25/1.61 = 1.40, which is less than 2. 
Conditions are safe for use of ANOVA. (b) Calculations are 
provided: 

q = 1LX 647 + 17X35 + 17 X 405 + 17 x 5.02 _ 4 9995 


47 — 4.8225)? + 17(3.75 — 4.8225)? + 17(4.05 — 4.8225)? + 17(5.02 — 4.8225)? 
MSG = L642 4.8225) + 17(3.75 — 4 ytd 4.8225)? + 17( 4.8225) 


= 25.502 
MSE = (2 = 1225 + (17 = L777 = 1)L6L + (17 = D180 _ 3 597 


68 — 4 
= Mee = 1272 


(c) We have df = 4 — 1 = 3 and 68 — 4 = 64. P = 0.0003 
(obtained using software). There is strong evidence that 
the mean status scores between the four groups studied are 
not equal—a conclusion consistent with the solution to 


Exercise 27.2. 


27.15: (c) the means of several populations 


27.17: (c) The alternate hypothesis for ANOVA is always 
that there is some difference in the means (but it does not 
specify the type of difference). 


27.19: (a) P = 0.026, so we reject Hp and conclude that 
there is a difference in mean breaking strength. 


27.21: (a) This is the problem of multiple comparisons. 


27.23: (c) We do not have three independent samples from 
three populations. 


27.25: The populations are college students that might view 
the advertisement with an art image, college students that 
might view the advertisement with a non-art image, and col- 
lege students that might view the advertisement with no 
image. The response variable is student evaluation of the 
advertisement on the 1-7 scale. We test the hypothesis 
Ho: fy = by = My; (all three groups have equal mean adver- 
tisement evaluation) versus H,: not all means are equal. I = 3, 
n, =m =n; = 39, so N = 39 + 39 + 39 = 117. There are 
then] —-1=3—1=2andN —-I=117-3=114df. 


27.27: The response variable is hemoglobin Alc level. 
I = 4: a control (sedentary) population, an aerobic exercise 
population, a resistance training population, and a combined 
aerobic and resistance training population. We test 
Ho: bi = Po = Bs = by (all four groups have equal mean 
hemoglobin Alc levels) versus H,: not all means are equal. 
n, = 41,n) = 73,n3 = 72, andn, = 76.N =41 +73 +72 + 
76 = 262. We have I — 1 = 4 — 1 =3 and N — 1 = 262 — 
4 = 258 df. 


27.29: (a) The graph suggests that emissions rise when a 
plant is attacked because the mean control emission rate is half 
the smallest of the other rates. (b) The null hypothesis is “all 
groups have the same mean emission rate.” The alternative is 
“at least one group has a different mean emission rate.” (c) 
The most important piece of additional information would be 
whether the data are sufficiently close to Normally distributed. 
(From the description, it seems reasonably safe to assume that 
these are more or less random samples.) (d) SEM = s/ V8, so 


CHAPTER 27 @® One-Way Analysis of Variance: Comparing Several Means 


s = SEMV8; they are 16.77, 24.75, 18.78, and 24.38. This 
factor cancels out in the process of finding the ratio, so we can 
find this ratio directly from the SEMs: $2 = ee = 1.48, 
which satisfies our rule of thumb. 


30 5 
= 
) 25 4 
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Control Flea Beetle = Hornworm Leaf Bug 
Group 


27.31: (a) The treatment means suggest that extra water 
in the spring has the greatest effect on biomass, with a lesser 
effect from added water in the winter. ANOVA is risky 
with these data; the standard deviation ratio is nearly 3 
(58.77/21.69 = 2.71), and the winter and spring distributions 
may have skewness or outliers (although it is difficult to judge 
with such small samples). 


Level N Mean StDev 
Control 6 13.665 21.69 
Spring 6 315.39 37.34 
Winter 6 205.17 58.77 


(b) We test Ho: Wy = M, = Me (all treatments have the same 
mean) versus H,: at least one mean is different. (c) ANOVA 
gives a statistically significant result (F = 27.52, df = 2 and 15, 
P < 0.0005), but as noted in part (a), the conditions for 
ANOVA are not satisfied. Based on stemplots and the means, 
however, we should still be safe in concluding that added water 
increases biomass. 


One-Way ANOVA: Biomass versus Treatment 


Source DF Ss MS F P 
Treatment 2 97583 48792 27.52 0.000 
Error 15) 26593 1773 

Total 17 124176 

S = 42.11 R-Sq = 78.58% R-Sq(adj) = 75.73% 


27.33: (a) PLAN: We have data on sleep quality and depres- 
sion scores for 898 students at a large midwestern university. 
We'll have to assume these students are close to a random sam- 
ple of college students and that the students are independent of 
one another. SOLVE: Side-by-side boxplots are used to examine 
the distributions. All three groups show outliers at the high end 
of the depression score range, but with such large samples (the 
smallest is 246), it is reasonable to believe the sample means 
have Normal distributions (using the CLT). The condition on 
standard deviations is satisfied because 4.719/2.560 = 1.84 < 2. 
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We have F = 75.52 with df = 2 and 895, giving P = 0.000 (to 
three decimal places). CONCLUDE: The mean depression 
scores for the three levels of sleep quality are not the same. From 
the output and graphs, it appears the mean depression score for 
poor sleepers is highest; the mean depression score for optimal 
sleepers is lowest. (b) Assuming the students were randomly 
selected, the large sample size would lead us to believe these 
students are most likely representative of other college students. 
(c) Students were not randomly assigned to sleep conditions. 
Explanations about causation may vary, but this might well be a 
case of one condition (poor sleep) feeding the other (depression) 
in a “vicious cycle.” 


Sleep Quality and Depression 


25-4 * 
- 
7 ae 
* 
rn 
20 4 _ 
= * * 
2 
“A 
$154 xen 
= : 
oy EKREX 
a [ 
54 —=a 


Borderline Optimal Poor 
Sleep 
Source DF ss MS F P 
Sleep 2 2162.3 1081.1 72.52 0.000 
Error 895 13343.7 14.9 
Total 897 15506.0 
Individual 95% CIs For Mean 
Based on Pooled StDev 
Level N Mean StDev ----+------ +------ +------ +---- 
Borderline 246 8.764 3.892 (---*---) 
Optimal 309 7.013 2.560 (---*---) 
Poor 343 10.656 4.719 (---*---) 
----+------ +------ +------ +---- 
Wied 8.4 9.6 10.8 


Pooled StDev = 3.861 


27.35: PLAN: We will carry out an ANOVA test for the 
equality of means. SOLVE: The ratio of largest standard 
deviation to smallest standard deviation is 2.388/1.959 = 
1.22, which is less than 2. Histograms show some evidence 
of non-Normality, and perhaps one outlier in the “No 
Weather Report” group. We proceed, as the samples are rea- 
sonably large. From the output, we have F = 20.679 with 
3 — 1 = 2 and 60 — 3 = 57 df with P = 0.000. CON- 
CLUDE: There is overwhelming evidence that the mean tip 
percentages are not the same for all three groups. Examina- 
tion of the summary statistics and the histograms suggests 
that while mean tip for the bad-report group is similar to 
that of the no-report group, the mean tip for the good- 
weather report is higher. 
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One-way ANOVA: Percent versus Report 


Source DF Ss MS F P 
Report 2 192.22 96.11 20.68 0.000 
Error 57 264.92 4.65 

Total 59 457.15 

S = 2.156 R-Sq = 42.05% R-Sq(adj) = 40.02% 

Level N Mean StDev 

Bad 20 18.180 2.098 

Good 20 22.220 1.959 

None 20 18.725 2.388 


27.37: (a) The table is given in the Minitab output; because 
4.500/3.529 = 1.28 < 2, ANOVA should be safe. The means 
suggest that logging reduces the number of species per plot and 
that recovery takes more than eight years. (b) ANOVA gives 
F = 6.02 with df = 2 and 30, so P < 0.01 (software gives 
0.006). We conclude that these differences are significant; the 
mean number of species per plot really is lower in logged areas. 


One-Way ANOVA: Species versus Group 


Source DF Ss MS F P 
Group 2 204.4 102.2 6.02 0.006 
Error 30 509.2 17.0 

Total 32 713:3,6 

S = 4.120 R-Sq = 28.64% R-Sq(adj) = 23.88% 


Individual 95% CIs For Mean 
Based on Pooled StDev 


Level N Mean StDev ------ +------ +------ +------- + 
1 12 17.500 3.529 (------ He ae as ) 
2 12. 11.750 4.372 (------ *o----- ) 
3 913.667 4.500 (------ Wwe ) 
------ 4+------4+------+-------+ 
12.0 15:0 18.0 21.0 


Pooled StDev = 4.120 


27.39: (a) See plot. (b) There is a slight increase in growth 
when water is added in the wet season, but there is a much 
greater increase when it is added during the dry season. 
(c) The means differ significantly during the first three years. 
(d) The year 2005 is the only one for which the winter biomass 
was higher than the spring biomass. 
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27.41: In addition to a high standard deviation ratio 
(117.18/35.57 = 3.29), the spring biomass distribution has a 
high outlier. 


27.43: (a) This is a comparison of two means, so it requires 
a two-sample t test. (b) This is a comparison of three means, so 
it requires ANOVA. (c) This is a comparison of three propor- 
tions, so it requires a chi-square test of homogeneity. 
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central probability under Normal curve, 380 
chi-square distributions, 584, 594 

for confidence levels, 379, 432 

defined, 379 

selection of, 386 

t, 458, 475, 506 

table, significance from, 404-405 


Crossed designs, 30-14, 30-32 
Cross-sectional data, 34 
Crunchlt! 


ANOVA, 651 
binomial probabilities, 332 
chi-square statistic, 586 
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comparing two proportions, 543, 544 

descriptive measures, 62 

expected counts, 586 

Kruskal-Wallis test, 28-27 

least-squares regression, 133 

Mann-Whitney test, 28-8 

multiple regression models, 29-40, 29-41, 

29-42, 29-45 

parallel regression lines, 29-10, 29-11 

regression inference, 617 

t confidence interval, 465 

t test, 467 

two-sample t procedures, 495, 496 

two-way table output, 168 

Wilcoxon signed rank test, 28-19 
Cumulative distribution function, 88 
Cumulative proportions 

defined, 86-87, 94 

illustrated, 87 
Cycles, 33, 35 


Data 
always looking at, 3-4 
categorical, 183 
cross-sectional, 34 
defined, 2 
exploring, 16, 179-183 
as numbers with a context, 2 
plotting, 35, 60 
science of learning from, 2 
source of, 2 
statistics as science of, 13 
summary, 181 
time-series, 34 

Data analysis 
defined, 7, 179 
exploratory, 16, 35 
before inference, 431 
organizing principles, 11 
from sampling designs, 213 
summary, 560 
where to begin, 4 

Data ethics 
behavioral and social science experiments 

and, 261-263 

clinical trials and, 260-261 
complex issues of, 254 
confidentiality and, 258-260 
importance of, 253 
informed consent and, 256-258 
institutional review boards and, 254-256 
summary, 269 

Data production 
data ethics and, 253-263 
defined, 7 
designs for, 267 
experiments, 227-245 
introduction to, 201 
observational studies in, 245 
review, 267-269 
sampling, 203-220 
summary, 268-269, 559-560 

Data sets, 35 

Degrees of freedom 
for ANOVA F test, 660 
chi-square distributions, 593, 599 
defined, 58 
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Degrees of freedom (continued) 
F distributions, 659 
individual t procedures, 29-56 
multiple linear regression model, 29-30 
population regression lines, 613 
regression standard error, 634, 29-6 
sums of squares, 30-30, 30-31 
t approximation, 500 
t distributions, 457, 475 
two-sample t statistic, 506 
Density curves. See also Normal curves 
areas under, 78 
in assigning probabilities, 291, 296 
chi-square distributions, 593 
defined, 77, 93 
describing, 78-80 
F distributions, 660 
finding with exponential distribution, 360 
histograms and, 76-77 
illustrated, 76, 77 
mean of, 79, 80 
median of, 79, 80 
outliers and, 78 
for overall pattern description, 78 
probability as area under, 290 
shapes, 77 
standard deviation of, 80 
summary, 181-182 
symmetric, 79 
t distributions, 457 
uniform, 290 
Dependent variables. See Response variables 
Deviations. See also Standard deviations 
from pattern, 25, 35 
squared, 59 
Dice, 278 
Direction 
defined, 117 
negative association and, 107, 117 
positive association and, 107, 117 
relationships, 106, 117 
scatterplots, 105-106 
Discrete random variables, 293, 296 
Disjoint events 
addition rule for, 284, 296, 304, 318 
defined, 284, 296, 318 
independent events and, 307 
Distributions. See also Normal distributions 
binomial, 327-339, 442 
of categorical variable, 17 
center, 25, 501 
chi-square, 584, 593-594 
conditional, 578 
continuous, 28-2 
defined, 17, 35 
describing (quantitative variable), 181 
describing with numbers, 47-66 
displaying, 181 
exponential, 358 
F, 659-661 
marginal, 164-166 
mean, 48-49 
median, 49-50 
numerical summaries of, 181 
outliers, 25-26 
permutation, 504 
picturing with graphs, 13-36 


population, 351, 366 
probability, 296 
sampling, 345-367, 441, 442 
shape, 25, 28 

skewed, 25, 35 
standard, 28-1 
symmetric, 25, 35 

t, 456-458 

technology and, 61-63 
uniform, 290 
variability, 25, 501 


Double-blind experiments, 240, 245 
Dropouts, 417 
Dummy variables. See Indicator variables 


Ecological correlation 


beware of, 146 
defined, 146, 151 


Effect size 


in number of observations, 426 
power and, 428 
small, 432 


Errors. See also Margin of error 


E 


S 


probabilities, calculating, 430 
roundoff, 17 

standard, 456 

Type I, 429 

Type Il, 429 

timated regression model, 29-55 


Events 


defined, 282, 296 

disjoint, 284, 318 
independent, 305-307, 318 
nonoverlapping intervals, 291 


outcomes, 284 
probability of, 282, 284 
in Venn diagram, 304 


Excel. See Microsoft Excel 
Expected counts 


chi-square test, 584-585 

chi-square test for goodness of fit, 595-596 
defined, 581, 598 

example, 582 

formula, 581, 582 

technology in calculation, 585-586 

in two-way tables, 581-583 


Experimental design 


block design, 242-243, 245 
completely randomized, 235 
control, 238, 245 

defined, 244 

matched pairs design, 241-242, 245 
principles of, 238, 245 
randomization, 238, 245 
replication, 238, 245 


Experiments 


advantages of, 231-232 
attention to detail and, 245 
behavioral, 261-263 
cautions about, 239-241 
clinical trial, 260-261 
comparative, 234 
confounding and, 231 
control group, 234 
defined, 149, 152, 244 
double-blind, 240, 245 
dropouts from, 417 


factors, 230, 231-232, 244 
field, 233 
laboratory, 233 
lack of realism, 240-241, 245 
with living subjects, 233 
lurking variables and, 149 
observational studies versus, 227-230 
permutation tests, 504 
placebos, 239, 245 
poor, 233-234 
producing data with, 227-245 
randomized comparative, 234-239 
social science, 261-263 
statistical analysis of, 241 
subjects, 230, 231, 244 
summary, 268-269 
treatments, 230, 231, 244 
two-factor, 232 
uncontrolled, 233 
in understanding cause and effect, 228 
use of, 228, 245 
vocabulary of, 230 
Explanatory variables 
confounded, 229 
correlation and, 113, 118 
creating new, 29-43 
defined, 102, 117 
identifying, 102 
interaction between, 29-22-29-28 
least-squares regression line and, 152 


multiple regression model and, 29-16-29-17 


plotting, 104 
regression and, 135 
two interacting, 29-29-29-30 
Exploratory data analysis 
conclusions, 201 
defined, 16, 35 
purpose of, 201 
Exponential distributions 
in finding density curves, 360 
use of, 358 
Extrapolation 
beware of, 146 
defined, 146, 151 


F distributions 
defined, 659, 666 
degrees of freedom, 659, 662 
density curves, 660 
values, 659 
F tests. See ANOVA F test 
Factorial notation, 331, 339 
Factors. See also Interactions; Main effects 
combining effects of, 231-232 
constant, 231 
defined, 230, 244, 30-13 
two-way ANOVA, 30-13, 30-28, 30-33 
Field experiments, 233 
Finite probability models, 286, 296 
First quartile. See also Quartiles 
defined, 52, 66 
finding, 92 
in five-number summary, 53 
Five-number summary 
boxplots and, 53-55 
defined, 53, 66 
use of, 59 


Flowcharts 
defined, 31-2 
illustrated, 31-3 
outline, 234, 235 
use of, 31-37 
Follow-up analysis 
contrasts, 30-10-30-13 
F test, 30-4 
for multiple comparisons, 581, 648 
one-way ANOVA, 30-4, 30-32 
two-way ANOVA, 30-32, 30-33 
Form 
defined, 106, 117 
of relationships, 106, 117 
scatterplots, 106 
Four-step process 
definition, 63 
for confidence intervals, 381 
for tests of significance, 401 


Galton, Sir Francis, 129 
General addition rule 
for any two events, 307-309 
defined, 307-308, 319 
Venn diagram, 308 
General multiplication rule, 311-313, 319 
Graphing calculators, 61. See also Texas 
Instruments graphing calculator 
Graphs. See also specific types of graphs 
as best overall picture of distributions, 60 
marginal distribution, 164-165 
purpose of, 24 
as visual tools, 35 


Histograms 
center, 25 
classes, choosing, 21-22 
creation example, 21-23 
cross-sectional data, 34 
defined, 21, 35 
density curves and, 76-77 
drawing, 22-23 
examining, 25 
function of, 35 
illustrated, 26, 27, 31 
interpreting, 24-29 
outliers, 25 
overall pattern, 25 
probability, 336 
shape, 25 
spread, 25 
variability, 25 
Hypotheses. See also Alternative hypothesis; 
Null hypothesis 
chi-square test, 591-592 
before data, 395 
Kruskal-Wallis test, 28-24 
population parameter reference, 395 
simultaneous tests of, 30-7 
stating, 394-396 
Wilcoxon rank sum test, 28-10-28-11 
Hypothesis testing 
error types, 429 
goal of, 409-410 


Independent events 


defined, 305, 318 


disjoint events and, 307 
multiplication rule for, 305, 313, 318 
positive probability, 313 
Independent observations 
defined, 327 
failure of, 327 
regression inference, 632 
residual plots, 629 
success of, 327 
Independent variables. See Explanatory variables 
Indicator variables 
defined, 29-3, 29-4, 29-55 
as dummy variables, 29-4 
use of, 29-4 
Individual t procedures 
comparing groups with, 30-6 
degrees of freedom, 29-56 
Individual t tests for coefficients. See also t tests 
confidence intervals and, 29-50 
form, 29-50 
inference for multiple regression, 29-16 
model in use and, 29-49 
multiple regression models, 29-49-29-50 
results interpretation caution, 29-49 
Individuals 
defined, 14, 34 
as subjects, 230 
variables and, 13-16 
Inference. See Statistical inference 
Inference about a population contrast, 30-11 
Inference about a population mean 
conditions for, 455-456 
matched pairs t procedures, 467-469 
Normal distribution, 455 
one-sample t confidence interval and, 458-461 
one-sample t test and, 461-464 
population as much larger than sample, 456 
resampling and standard errors and, 472-475 
simple random samples (SRS), 456 
summary, 563 
t distributions and, 456-458 
t procedures robustness and, 469-472 
technology and, 464-467 
Inference about a population proportion 
assumption, 532 
large confidence intervals and, 520-523 
overview of, 517-518 
plus four confidence intervals, 528-531 
sample proportion, 518-520 
sample size selection, 523-525 
summary, 563 
tests of significance, 525-528 
Inference about relationships 
overview, 575-576 
regression, 609-635 
two categorical variables, 577-600 
Inference about variables 
comparing two means, 485-516 
comparing two proportions, 539-557 
overview of, 453 
population mean, 455-484 
population proportion, 517-538 
review, 559-562 
summary, 562-564 
Inference for multiple regression. See also Multiple 
regression models 
conditions for, 29-15 
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constant variance, 29-14 
examples, 29-17-29-20 
F statistic, 29-15 
independence, 29-14—29-15 
individual t tests for coefficients, 29-16 
linear trend, 29-14 
Normality, 29-14 
Inference for two-way ANOVA 
examples, 30-20-30-27 
outline, 30-20 
Influential observations 
defined, 143, 151 
example, 143 
multiple regression model, 29-53 
outliers in x direction and, 145 
subjective nature of change, 143 
Informed consent. See also Data ethics 
behavioral experiments, 262 
defined, 256, 263 
difficulties of, 257 
example, 256-257 
forms, 257 
Institutional review boards. See also Data ethics 
defined, 263 
functions of, 254-256 
purpose of, 254 
web page example, 255 
work load size and, 256 
Interactions. See also Multiple regression models 
defined, 29-22 
between factors, 30-17 
main effects and, 30-16-30-17 
model development and, 29-25 
plots, 30-24, 30-25 
product term, 29-22 
two-way ANOVA, 30-24, 30-25 
two-way ANOVA F tests, 30-31 
Intercept 
defined, 129, 151 
least-squares regression line, 132 
population regression lines, 612 
regression inference, 611 
regression line, 130, 151 
Interquartile range (IQR) 
defined, 55 
formula, 55 
in 1.5 X IQR rule, 56 


JMP 
ANOVA, 651 
descriptive measures, 62 
least-squares regression, 133 
parallel regression lines, 29-10, 29-11 
regression inference, 616, 618 
t confidence interval, 465 
t test, 466 
two-sample t procedures, 495, 496 


Kruskal-Wallis test 
alternative hypothesis, 28-24, 28-29 
comparing several samples, 28-23-28-24 
conditions for, 28-24 
defined, 28-24, 28-29 
hypotheses, 28-24 
idea of, 28-24 
large sample sizes and, 28-25 
null hypothesis, 28-24, 28-25, 28-29 
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Kruskal-Wallis test (continued) 
technology and, 28-26-28-27 
ties in, 28-25 
use of, 28-30 

Kruskal-Wallis test statistic 
chi-square distribution, 28-30 
defined, 28-25, 28-29 
exact distribution of, 28-25 
formula, 28-25 
sums of ranks, 28-25 


Laboratory experiments, 233 
Lack of realism, 240-241, 245 


Large-sample confidence intervals 


for comparing two proportions, 542-543, 551 
for population proportions, 520-523, 531 


for sample proportions, 528 
Law of large numbers 

defined, 348, 366 

example, 348-349 

gambling and, 349 

mathematical proof of, 348 

statistical estimation and, 347-350 

true probability and, 337 

use of, 349 
Law of Large Numbers applet, 349 
Least-squares method, 131 
Least-squares regression 

defined, 634 

facts about, 135-138 

technology and, 132-135 
Least-squares regression lines 

cause and effect and, 148 

caution, 152 

correlation and, 146 

defined, 131 

equation, 132 

explanatory variables and, 152 

illustrated, 131, 136 

intercept, 132 

outliers in x direction and, 145 

passing through point, 136 

popularity of, 131 

predicted response, 132 

regression inference, 611 

slope, 132, 135 

technology and, 132-135 
Least-squares residuals, 140 
Least-squares slope 

correlation and, 620 

standard error, 619 

standardized version, 619 
Leaves, 29 
Left-skewed distribution, 25 
Level of significance. See Significance level 
Linear regression, 619 
Linear relationships 

defined, 107, 117 

direction of, 118 

measuring with correlation, 111-113 

regression inference, 632 

residual plots, 629 
Lines 

plotting, 129 

straight, 129, 136, 611 
Lurking variables 


beware of, 146-147 


categorical variable relationships and, 171 
confounded, 229 

defined, 147, 152 

experiments and, 149 

explanatory and response variables and, 152 
in Simpson’s paradox, 172 


Main effects 
as “on the average” effect, 30-17 
defined, 30-15, 30-17, 30-32 
interactions and, 30-16—30-17 
with no interaction, 30-15—30-16 
of repetitions, 30-16 
two-way ANOVA F tests, 30-31 
Mann-Whitney test, 28-8, 28-29 
Margin of error 
caution, 419 
confidence intervals, 383, 419-420, 431 
convenience samples and, 420 
defined, 376 
population proportion, 523-524, 531 
prediction with, 624 
sample size and, 383, 425 
small, 424-425 
smaller, 383 
voluntary response samples and, 420 
Marginal distributions 
calculating, 164-165 
conditional distributions versus, 166 
defined, 164, 173 
graph of, 165 
relationships among variables and, 166 
Matched pairs design 
as block design, 242 
defined, 241, 245, 467 
mean difference in, 28-16 
order of treatments, 241-242 
subjects, 241 
Wilcoxon signed rank test, 28-16-28-18 
Matched pairs t procedures 
defined, 467, 475 
example, 467-469 
population mean, 467 
Matched pairs t test, 28-29 
Mean and Median applet, 50 
Mean of a density curve 
as balance point, 79, 80, 93 
defined, 79, 80, 93 
in Normal distributions, 81 
notation, 80 
Mean response 
confidence intervals for, 627, 634 
estimation of, 626 
standard errors for, 627 
Mean square for error (MSE) 
in ANOVA F statistic calculation, 662 
defined, 662 
denominator, 663 
formula, 662 
Mean square for groups (MSG) 
in ANOVA F statistic calculation, 662 
defined, 662 
denominator, 663 
formula, 662 
Mean squares, 662 
Means 


binomial, 334-335, 339 


chi-square distributions, 594 

comparing several, 647-648 

comparing two, 485 

contrasts among, 30-10 

correlation and, 116 

defined, 48, 66 

example of, 48 

extreme observations and, 59 

finding, 48 

inference about a, 374 

of least-squares residuals, 140 

median comparison, 50-51 

as not a resistant measure, 48 

outliers and, 59 

population. See population mean 

process, 31-8 

of random samples, 358 

regression to the, 158 

regression toward, 129 

sample. See sample means 

of sample mean, 353 

of sampling distribution, 366, 518 

for symmetric density curves, 93 

use of, 59 

variability about the, 59 

Wilcoxon signed rank statistic, 28-17 
Median 

defined, 49, 66 

of density curves, 79, 80 

finding, 49-50 

in five-number summary, 53 

location in ordered list, 49 

mean comparison, 50-51 

for symmetric density curves, 93 
Microsoft Excel 

ANOVA, 651, 653 

binomial probabilities, 333 

descriptive measures, 62 

least-squares regression, 133 

regression inference, 616 

t confidence interval, 465 

t test, 466 

two-sample t procedures, 495, 496 
Minitab 

ANOVA, 651, 653, 658, 30-3, 30-8 

binomial probabilities, 332 

chi-square statistic, 585 

comparing two proportions, 543-544 

descriptive measures, 62 

expected counts, 585 

interaction plots, 30-24, 30-25 

Kruskal-Wallis test, 28-26 

least-squares regression, 132 

multiple regression models, 29-45 

one-way ANOVA, 28-26, 30-39 

parallel regression lines, 29-10, 29-11 

power calculation, 428 

probability histogram, 336 

regression inference, 616 

regression model with two regression lines, 

29-24 

simple linear regression, 29-35 

t confidence interval, 465 

t test, 466 

Tukey pairwise multiple comparisons, 30-39 

two-sample t procedures, 495, 496 

two-way ANOVA, 30-22, 30-26 


two-way table output, 168, 591 

Wilcoxon signed rank test, 28-19 

Modified boxplots, 56 

Mosaic plots 

defined, 169 

illustrated, 169 

information in, 169-170 

Multiple analyses, 423 

Multiple comparisons 

alternative hypothesis and, 580 

conclusions, 30-10 

confidence intervals and, 580 

defined, 30-2 

follow-up analysis, 581, 648 

null hypothesis and, 580 

overall test, 581, 648 

problem definition, 580 

problem of, 580-581, 648 

P-values and, 580 

statistical methods for dealing with, 580-581, 
648 

steps for dealing with, 648 

Tukey pairwise, 30-7, 30-32, 30-33 

Multiple linear regression model. See also Multiple 
regression models 

B parameters estimation, 29-30 

defined, 29-28 

degrees of freedom, 29-30 

examples, 29-28-29-32 

interaction of explanatory variables, 
29-29-29-30 

Normal distribution, 29-28 

population standard deviation estimation, 29-30 

standard deviation, 29-28 

Multiple regression case study 

automated algorithms, 29-41 

creating new explanatory variable, 29-43 

Crunchlt! output, 29-40, 29-41, 29-45 

Crunchlt! parameter estimates, 29-42 

descriptive statistics and correlation 
coefficients, 29-39 

final model, 29-44 

goal, 29-36 

highest correlation, 29-41 

including all explanatory variables, 29-40-29-41 

including other explanatory variables, 
29-42-29-43 

marketing data, 29-36-29-38 

Minitab output, 29-45 

overview of, 29-36 

regression model construction, 29-36 

relationships among variables, 29-38-29-39 

Multiple regression models 

ANOVA F test, 29-49 

automated algorithms, 29-41 

building, 29-36 

case study, 29-36-29-48 

choosing, 29-25-29-26 

conditions for inference, 29-53-29-55 

confidence intervals for, 29-50, 29-56 

defined, 29-2 

estimated, 29-55 

explanatory variables and, 29-16-29-17 

F statistic for, 29-15 

flexibility of, 29-28-29-32 

general multiple linear regression model, 


29-28-29-34 


illustrated, 29-5 
individual t tests for coefficients, 29-49-29-50 
inference for, 29-13-29-22 
inference for regression parameters, 
29-48-29-52 

influential observations, 29-53 
interaction, 29-22-29-28 
interpreting, 29-4 
outliers, 29-53 
overview of, 29-1-29-2 
parallel regression lines, 29-2—29-5 
parameter estimation, 29-5-29-10 
predication intervals for, 29-50, 29-56 
prediction and, 29-50 
quadratic regression model, 29-30-29-32 
regression coefficients and, 29-34-29-36 
residual plots, 29-53, 29-54 
residuals, 29-5-29-6 
simple linear regression model and, 29-57 
technology and, 29-10 
with two regression lines, 29-23, 29-55 

Multiplication rule 
for any two events, 311 
extended, 312 
general, 311-313, 319 
for independent events, 305, 313, 318 
occurrence of, 307 

Multistage samples, 213 


Natural tolerances, 31-29-31-30, 31-37 
Negative association 
defined, 106, 107 
direction and, 117 
indication, 113 
NHANES survey, 417-418 
95% confidence interval, 423 
No linear relationship, testing hypothesis of, 
619-620 
Nonnumerical outcomes, 294 
Nonparametric tests 
comparison of, 28-2 
defined, 28-2, 28-29 
Kruskal-Wallis test, 28-23-28-29 
Mann-Whitney test, 28-8 
overview of, 28-1 
replacing data by ranks, 28-30 
Wilcoxon rank sum test, 28-2—28-15 
Wilcoxon signed rank test, 28-16-28-23 
Nonresponse 
defined, 214, 219 
inference and, 417 
rates of, 214-215 
web surveys and, 217 
Nonsense correlation, 148 
Normal approximation 
accuracy of, 336 
to binomial distribution, 335-338, 339 
of binomial probability, 337 
central limit theorem, 360 
rank sum statistic, 28-6-28-8 
remembering, 335 
rule of thumb, 337 
Wilcoxon rank sum test, 28-11 
Wilcoxon signed rank statistic, 28-18-28-20 
Normal Approximation to Binomial applet, 337 
Normal curves. See also Density curves 
central area under, 379 
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characteristics of, 80-81 
critical value, 380 
defined, 80, 93 
illustrated, 81 
locating points on, 91 
Normal Density Curve applet, 88 
Normal distributions 
ANOVA, 656, 666 
calculating percentage in interval for, 94 
central limit theorem and, 356 
in comparing two means, 487 
as condition for inference about a mean, 456 
as continuous probability models, 291 
defined, 80, 81, 93 
importance of, 82 
multiple linear regression model, 29-28 
process monitoring, 31-8 
sample mean and, 355, 367 
sample standard deviation and, 31-13 
as simple condition, 374 
68-95-99.7 rule, 82-85, 93-94 
special properties of, 81 
specification of, 81 
standard, 85-86, 94 
standard deviation, 502 
summary, 181-182 
t confidence interval and test and, 469-470 
tests for population mean, 402 
two-way ANOVA, 30-32 
use of, 94 
Normal population procedures, 502 
Normal proportions 
finding, 86-88 
using a table to find, 89 
Null hypothesis. See also Hypotheses 
abbreviation, 395 
chi-square test, 592 
chi-square test for goodness of fit, 595, 596 
chi-square test for homogeneity, 592 
chi-square test for independence, 592 
comparing proportions, 545 
defined, 394, 409 
evidence against, 398-399 
Kruskal-Wallis test, 28-24, 28-25, 28-29 
of no treatment effect, 502 
one-way ANOVA, 666 
plausibility of, 421 
P-values and, 398-399 
rejection consequences, 421 
testing, 401 
tests of significance for, 409 
two categorical variables and, 580 
Null model, 29-15 
Numbers 
with context, data as, 7 
describing distributions with, 47-66 
faked, 287 
random, 278 
written differences of, 21 
Numerical outcomes, 293 
Numerical summaries 
center in, 66 
choosing, 59-60 
of distributions, 181 
as misleading, 60, 66 
plotting data and, 60, 66 
variability in, 66 
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Observational studies 
cause and effect and, 228 
confounding and, 229 
defined, 228, 244 
experiments versus, 227-230 
learning from, 3 
planning, 424-431 
in producing data, 245 
sample surveys, 204-205, 214-216, 219, 227 
Observations 
in binomial distributions, 328 
binomial distributions, 339 
cost and time of taking, 424 
in data sets, 35 
extreme, 59 
in five-number summary, 53 
independent, 327 
influential, 143-145 
number determination factors, 426 
ranks of, 28-3, 28-29 
standardizing, 85-86 
1.5 X IQR rule, 56 
One-sample t confidence interval, 458-461 
One-sample t statistic, 457, 475 
One-sample t test, 461-464 
One-sample z test statistic, 401, 409, 475 
One-sided alternative hypothesis, 394, 395, 409 
One-sided P-value, 397, 458 
One-Variable Statistical Calculator applet, 31 
One-way ANOVA. See also ANOVA 
ANOVA tables, 30-29 
beyond, 30-1-30-5 
calculations, 663-664, 666 
comparisons, 30-1 
conditions for, 656-659, 666 
defined, 655, 666, 30-1 
degrees of freedom for error, 30-11 
details of, 653, 661-665 
example, 30-2-30-4 
F test, 30-29 
follow-up analysis, 30-4, 30-32 
independent SRSs, 656, 666 
interpreting, 666 
mean response for several treatments, 28-23 
Normal distribution, 656, 666 
rejection of null hypothesis, 666 
robustness, 666 
standard deviation, 656, 666 
technology and, 650-653 
total sum of squares, 30-29 


two-way ANOVA output comparison, 30-28, 


30-30-30-31 

variability of observations, 656 
One-Way ANOVA applet, 654 
Ordered lists, 49 
Outcomes 

event, 284 

nonnumerical, 294 

numerical, 293 

personal probability of, 294-296 

probability, 284 

proportion, 517 

random number generator, 289 

sample space, 282 

tree diagrams and, 316 
Outliers 

checking for, 470 


defined, 25, 35 

density curves and, 78 

explanations for, 59 

identifying, 55-57 

mean and, 59 

multiple regression model, 29-53 

not hiding, 48 

1.5 X IQR rule for, 56 

regression inference and, 632 

removing, 28-1 

sample mean values and, 59-60 

scatterplots, 106, 117 

standard deviation and, 59 

statistical inference and, 418 

in x direction, 144, 145 
Out-of-control signals 

defined, 31-20, 31-37 

false alarms, 31-20, 31-21 

p charts, 31-33 
Overall confidence level, 30-7, 30-32 
Overall patterns 

defined, 25, 35 

density curves for, 78 

examples, 25-28 

histograms, 25 

residual plots, 140-141 

in scatterplots, 106 

time plots, 33, 34 
Overall significance level, 30-7 
Overall test, for multiple comparisons, 581, 648 


p charts 
center line, 31-33 
control limits, 31-33-31-36 
defined, 31-32, 31-37 
illustrated, 31-35 
interpretation of, 31-37 
out-of-control signals, 31-33 
settings, 31-32 
usefulness examples, 31-32 
using past data, 31-33 
Pairwise differences 
contrast, 30-11 
defined, 30-6 
Parallel regression lines. See also Multiple 
regression models 
defined, 29-2, 29-55 
example, 29-2-29-3 
illustrated, 29-5 
indicator variable, 29-3-29-4 
overall F test for, 29-15 
residual plot, 29-19 
technology and, 29-10-29-13 
Parameters 
defined, 346, 366 
estimating with sample statistic, 367 
population, hypotheses and, 395 
regression inference, 612-615 
sample statistics and, 346-347 
Pareto charts, 31-5, 31-34, 31-37 
Past data 
control charts, 31-24-31-26, 31-33, 31-37 
p charts, 31-33 
s charts, 31-24, 31-25 
x charts, 31-24, 31-26 
Patterns 
deviations, 25, 35 


faked numbers, 287 
overall, 25, 33, 34, 35 
probability description of, 280 
in residual plots, 631 
underlying, process of identifying, 118 
Permutation distribution, 504 
Permutation tests 
defined, 504 
small samples and, 28-2 
use of, 504, 507 
Personal probabilities 
defined, 295 
outcomes, 294-296 
set of, 295 
Pie charts 
defined, 17, 35 
function of, 35 
as having limited use, 19 
illustrated, 18 
requirements of, 17 
Placebo effect, 239-240 
Placebos, 239, 245 
Planning studies 
planning inference and, 432 
power of statistical test and, 426-431 
sample size for confidence intervals and, 
424-425 
two-sample problems, 497 
Plotting a line, 129 
Plotting your data, 35, 60 
Plus four confidence intervals 
for comparing two proportions, 549-551 
for population proportions, 528-531 
Plus four estimate 
accuracy, 530 
for comparing proportions, 549-551 
confidence intervals for proportions, 528-531 
defined, 528-529, 531 
example, 529 
formula, 529 
Pooled sample deviation, 663 
Pooled sample proportion 
defined, 546, 551 
significance tests for comparing two propor- 
tions, 545-546, 551 
Pooled sample variance, 663 
Pooled two-sample t statistic, 501 
Population contrasts 
defined, 30-10 
estimating, 30-11 
inference about, 30-11 
Population distributions 
defined, 351, 366 
description, 351 
Normality assumption, checking, 474 
sampling distribution and, 354 
shape of, 418 
Population mean 
confidence intervals for, 375, 379-383, 386, 
415, 455, 475 
defined, 366 
estimation of, 350 
inference about, 455-476 
matched pairs t procedures, 467 
notation, 346 
pairwise differences, 30-6 
sample mean and, 350 


simple random samples (SRS), 406 

statistical inference, 415 

tests for, 400-404 

tests of significance for, 455 

two, comparing, 487-489 

unbiased estimator of, 353 

z test for, 401 

Population proportions 

comparing, 539-552 

confidence interval, 524 

defined, 517 

difference between sample proportions 
and, 545 

estimating, 523 

inference about, 517-532 

large-sample confidence intervals for, 
520-523, 531 

margin of error, 523-524, 531 

plus four confidence intervals for, 528-531 

P-values, 526 

sample size and, 523-525, 531 

sampling distribution of difference between, 541 

significance tests for, 525-528, 531 

of successes, 526 

Population regression lines 

defined, 634, 29-4 

degrees of freedom, 613 

estimating, 612 

intercept, 612, 623 

residuals, 613 

slope, 612 

slope, confidence intervals for, 622-624, 634 

straight-line relationship with, 611 

Population regression model, 29-28 

Population standard deviation 

confidence interval for the mean and, 386 

notation, 346 

as simple condition, 374, 416 

simple random samples (SRS), 406 

tests for population mean, 402 

Populations 

defined, 203, 204, 219 

inference about, 211-212 

sample surveys and, 204 

samples versus, 204—206 


size, sample size and, 425 


Positive association 


defined, 107 
direction and, 117 
indication, 113 


Power 


alternative hypothesis and, 428 
calculating, 427 

defined, 427 

finding with applet, 427 

finding with software, 428 

in number of observations, 426 
in planning studies, 426-431 
sample size and, 428, 432 
significance tests, 426-431, 432 


Predication intervals 


defined, 626, 634 

interpretation of, 626 

meaning of, 626 

for multiple regression models, 29-50, 29-56 
for regression response, 627 

as rough approximations, 629 


for single observation, 627 

for single response, 629 
Predicator variables. See Explanatory variables 
Prediction 

inference about, 624-628 

with margin of error, 624 

regression models and, 29-50 
Privacy policies, 258, 259 
Probability 

as area under density curve, 290 

assigning in finite model, 286 

assigning in infinite discrete sample space, 286 

binomial, 330-332 

conditional, 309-311 

defined, 267, 277, 296 

idea of, 278-279 

introduction to, 277-297 

law of large numbers and, 337 

long-run regular pattern, 280 

of observing sample statistic, 362 

outcomes, 284 

personal, 294-296 

proportions and, 279 

random samples and, 297 

randomized comparative experiments and, 297 

randomness and, 279 

success, 339 

summary, 441 

of Type I and Type II errors, 429 
Probability applet, 279, 280 
Probability distribution, 296 
Probability models 

continuous, 289-293, 296 

defined, 282, 296 

in describing long-run behavior, 296 

description, 281 

discrete, 286 

finite, 286, 296 

list of, 303 

tree diagrams, 314-318 

uses for, 303 
Probability rules 

addition rule for disjoint events, 307-309 

conditional probability, 309-311 

in describing long-run behavior, 296 

general, 303-319 

general addition rule, 307-309 

general multiplication rule, 311 

list of, 284 

multiplication rule, 307 

summary, 441-442 

using, 284-285 
Procedures 

based on sample statistics, 418 

Normality basis, 418 

t, 459, 467-472 

z, 415-418, 440 
Process mean 

defined, 31-8 

estimating, 31-22-31-23 

process in control and, 31-9 
Process monitoring 

conditions for, 31-8 

defined, 31-8, 31-37 

Normal distribution, 31-8 

s charts for, 31-13-31-19 

x charts for, 31-7-31-13 
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Process proportions, 31-33 
Process standard deviation 
defined, 31-8 
estimating, 31-23-31-24 
process in control and, 31-9 
target value and, 31-23 
Processes. See also Statistical process control 
capability, 31-30-31-31, 31-37, 31-38 
cause-and-effect diagrams, 31-2, 31-3, 31-4 
common cause variation, 31-6 
in control, 31-6, 31-31, 31-37 
defined, 31-2 
describing, 31-2-31-6 
examples of, 31-2 
flowcharts, 31-2, 31-3 
as focus, 31-28 
measurements made on, 31-4 
operations, 31-2 
out of control, 31-6, 31-37 
in sample-versus-population framework, 31-2 
special cause variation, 31-6 
summary, 31-37-31-38 
Proportions 
comparing, 539-552 
confidence intervals for, 521 
cumulative, 87, 94 
finding values given, 91-93 
large-sample confidence intervals for, 520-523 
Normal, 86-88 
outcome, 517 
plus four confidence intervals for, 528-531 
population, 517-532 
probability and, 279 
sample, 518-520, 531 
sampling distribution of difference between, 541 
significance tests for, 525-528 
of successes, 339 
Pseudorandom numbers, 280 
P-Value of a Test of Significance applet, 397 
P-values 
ANOVA F test, 660 
approximation of, 407 
chi-square test, 593 
defined, 396, 409, 440 
finding, 406 
finding, automating, 397 
how small as convincing, 421, 431 
large, 396 
multiple comparisons and, 580 
null hypothesis and, 398-399 
one-sample t test, 461-462 
one-sided, 397, 458 
population proportion, 526 
resampling, 406, 407 
statistical significance and, 396-400 
t distributions, 476 
technology for getting, 404 
two-sided, 398, 458 
Wilcoxon rank sum test, 28-8, 28-29 
Wilcoxon signed rank test, 28-29 


Quadratic regression model, 29-30-29-32 
Quantitative variables 

in correlation, 114, 115 

defined, 14, 35 

histograms and, 21-24 

as measure of quality, 31-8 
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Quantitative variables (continued) 
stemplots and, 29-32 
units of measurement, 14 
values, 21 

Quartiles 
area under the curve, 93 
calculating, 52 
defined, 52, 66 
finding, 52-53 


first, 52, 66 
third, 52, 66 
R charts 


defined, 31-20, 31-37 
out-of-control signal, 31-20 
using, 31-19-31-20 
Random digit dialing 
call screening and, 216 
cell phones and, 216 
defined, 210, 219 
landline future and, 217 
Random digits 
in sample selection, 219 
selecting, 289 
table of, 208, 245, 289 
Random number generator, 289 
Random numbers, from computer program, 280 
Random samples 
chance and results, 345 
large versus small, 211-212 
means of, 358 
multistage, 213 
probability and, 297 
simple (SRS). See simple random samples (SRS) 
stratified, 212, 219 
variability and, 278 
Random sampling 
bias and, 211 
defined, 212, 219 
designs for, 212-213 
rational subgroups versus, 31-28-31-29 
reasons to use, 211 
Random sampling error, 419 
Random variables 
continuous, 293, 296 
defined, 293, 296 
discrete, 293, 296 
notation, 293 
Randomization 
chance, 245 
as experimental principle, 238, 245 
in prevention of bias, 245 
Randomized comparative experiments 
bias and, 245 
chance and results, 345 
completely randomized design, 235 
control groups, 239 
defined, 234 
double-blind, 240 
illustrated, 268 
lack of realism, 240-241, 245 
logic of, 237-239 
for medical advances, 237 
placebos, 239, 245 
probability and, 297 
replication, 238 
statistically significant, 238 


subject assignment, 237 
subjects, 239 
treatments, 237 
use of, 245 
Randomized trials, 261 
Randomness 
defined, 279 
probability and, 279 
search for, 280-281 
understanding, 279 
Rank tests 
center of populations, 28-2 
comparison of, 28-2 
continuous distributions requirement, 28-2 
defined, 28-2, 28-29 
idea of, 28-3 
Kruskal-Wallis test, 28-23-28-29 
ranks of observations, 28-3, 28-29 
statistics summary, 28-30 
summary, 28-30 
ties in, 28-11-28-15, 28-20-28-23 
Wilcoxon rank sum test, 28-2—28-15 
Wilcoxon signed rank test, 28-16-28-23 
Rational subgroups 
choosing samples for, 31-29 
defined, 31-28, 31-37 
random sampling versus, 31-28-31-29 
Realism, lack of, 240-241, 245 
Reasoning of a Statistical Test applet, 392 
Regression. See also Multiple regression models 
cautions, 183 
correlation and, 151 
defined, 127 
explanatory and response variables 
and, 135 
least-squares, 132 
linear, 619 
lurking variables and, 146-147, 152 
to the mean, 158 
outliers in x direction and, 144 
residuals, 138-143, 151 
simple linear, 29-2 
toward the mean, 129 
Regression coefficients 
hypothesis test, 29-56 
woes of, 29-34—-29-36 
Regression inference 
about prediction, 624-628 
checking conditions for, 628-633, 635 
condition violations, 628 
conditions for, 611-612, 633, 634 
confidence intervals, 622 
correlation, 611 
example, 609-611 
illustration of conditions, 612 
least-squares regression line, 611 
least-squares slope, 619, 620 
nature of data, 612 
outliers and, 632 
parameter estimation, 612-615 
questions answered by, 609 
residual plots, 628-629 
scatterplots, 610 
slope and intercept, 611 
standard deviation, 611 
straight-line relationship, 611 
technology, 615-618 


testing hypothesis of no linear relationship and, 
619-620 
testing lack of correlation and, 620-622 
using software output, 634 
Regression lines 
correlation and, 146 
defined, 127, 151 
extrapolation, 146, 151 
finding for relationships, 137 
good, 131 
intercept, 130, 151 
least-squares, 131-132 
parallel, 29-2—-29-5 
population, 611, 612, 634 
for prediction, 137 
residual plots and, 140 
residuals and, 138-143 
slope of, 130, 151 
slope size, 130 
summary, 182-183 
using, 129 
Regression parameters, inference for, 612-615, 
29-48-29-52 
Regression slope 
confidence intervals for, 622-624 
defined, 622 
estimation of, 622-623 
example, 623 
significance test for, 619 
t test, 634 
Regression standard error 
calculation of, 613 
defined, 613, 634, 29-6, 29-55 
degrees of freedom, 634, 29-6 
as key measure, 614 
Relationships. See also Associations 
between categorical variables, 163 
defined, 107 
direction, 106, 117 
displaying with scatterplots, 103-105 
explanatory-response, 101-102 
finding regression line for, 137 
form of, 106, 117 
inference about, 575-576 
linear, 107, 117 
row and column variables, 599 
strength of, 106, 117 
Replication, 238, 245 
Resampling 
accuracy, 408 
defined, 407 
in estimating standard deviation, 473 
in estimating standard error of the sample 
mean, 473 
in estimating standard error of the sample 
median, 474 
example, 407-408 
repeated SRSs and, 408 
standard errors and, 472-475, 476 
Residual plots 
conditions for, 629 
defined, 140, 628 
illustrated, 141, 631 
independent observations and, 629, 632 
linear relationship and, 629, 632 
multiple regression model, 29-53, 29-54 
Normal residuals and, 629, 632 


overall patterns, 140-141 
parallel regression lines, 29-19 
patterns in, 631 
quadratic regression model, 29-31 
for regression inference, 628-629 
regression line and, 140 
standard deviation and, 629, 632 
Residuals 
defined, 138, 151, 613 
finding, 140 
multiple regression model, 29-5-29-6 
Normal, 629, 632 
population regression lines, 613 
standardized, 642 
Resistant measure, 48, 66 
Response bias, 215, 219 
Response variables 
correlation and, 113, 118 
defined, 102, 117 
identifying, 102 
plotting, 104 
regression and, 135 
Right-skewed distribution 
defined, 25 
density curve, 77 
histogram, 77 
Robustness 
ANOVA F test, 656, 30-15 
defined, 28-1 
Normal population procedures, 502 
t procedures, 469-472 
two-sample t procedures, 497-499, 507 
Rounding, 30 
Roundoff errors, 17, 167 
Row totals, 173 
Row variables, 163, 173 
Runs signal, 31-20-31-21 


s charts. See also Control charts 
center line, 31-14, 31-24 
chart setup, 31-24-31-25 
constants, 31-14, 31-15 
in control, 31-17 
control limits, 31-14, 31-24 
defined, 31-37 
example, 31-15-31-16 
illustrated, 31-15, 31-16 
lack of control on, 31-17 
Normal distribution and, 31-13 
out-of-control signal, 31-20 
for process monitoring, 31-13-31-19 
special causes, 31-17 
using past data, 31-24, 31-25 
Sample contrasts, 30-11 
Sample means 
behavior of, 348 
computed from SRS, 367 
defined, 366 
mean of, 353, 31-23 
Normal distribution and, 355, 367 
notation, 346 
outside control limits, 31-9 
population mean and, 350 


resampling for estimating standard error of, 473 


sample number versus, 31-10 
sample size and, 353, 355 


sampling distribution of, 352-355, 375, 406, 442 


standard deviation of, 353 
standard error of, 456 
trust of, 353 
two, comparing, 487 
watching change, 349 
Sample median, 474 
Sample proportions 
behavior of, 519 
control charts for, 31-32-31-36 
defined, 518, 531 
difference between, 540, 545 
equation, 518 
large-sample confidence intervals for, 528 
notation, 518 
out-of-control signals, 31-33 
pooled, 546, 551 
sampling distribution of, 518 
standard error of, 520, 551 
of successes, 518, 519 
Sample size 
of confidence intervals, 424-425 
importance of, 353 
large, 432 
margin of error and, 383, 425 
population proportion and, 523-525, 531 
population size and, 425 
power and, 428, 432 
sample mean and, 353, 355 
significance and, 421-422 
t procedures and, 470 
Sample space 
defined, 282, 296 
infinite discrete, 286 
interval of numbers, 289 
outcomes of, 282 
simplicity or complexity, 282 
Sample standard deviation. See also s charts 
facts, 31-14 
Normal distribution and, 31-13 
notation, 346 
Sample statistics 
defined, 346, 366 
in estimating unknown parameter, 367 
inference procedures based on, 418 
parameters and, 346-347 
probability of observing, 362 
sampling distribution of, 362 
standard deviation, 473 
Sample surveys 
cautions, 214-216 
comparing two proportions, 547 
defined, 219 
example, 204-205 
nonresponse, 214-215, 219 
as observational study, 227 
planning, 204 
populations and, 204 
response, 219 
response bias, 215, 219 
undercoverage, 214 
wording effects, 215, 219 
Samples 
chance in choosing, 278 
choosing, 203 
comparing several, 28-23-28-24 
comparing two, 28-2-28-6 
convenience, 206 
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defined, 204, 219 

imitating, 350 

multistage, 213 

nonresponse in, 417 
populations versus, 204-206 
simple random, 207-211, 219 
voluntary response, 207, 219 


Sampling 


biased, 220 

binomial distributions in, 328-329 
impact of technology, 216-219 
meaningful conclusions and, 220 
poor, 206-207 

in producing data, 203-220 
random. See random sampling 
summary, 268 


Sampling designs 


data analysis from, 213 
defined, 204, 219 
random samples, 212-213 


Sampling distributions 


center, 352 

central limit theorem and, 355-361 
of a count, 329 

defined, 351, 366 

difference between proportions, 541 
as idea pattern, 351 

interpretation of, 351 

mean of, 366 

mean of a density curve, 518 
obtaining without simulation, 351 
population distribution and, 354 
probability of observing values, 363 
repeated SRSs and, 386 

of sample mean, 352-355, 375, 406, 442 
of sample proportions, 518 

of sample statistic, 362 

shape, 351, 353 

standard deviation of, 353, 366, 518 
statistical significance and, 361-366 
summary, 441 

variability, 352 


Scatterplots 


background variables and, 108 
categorical variables in, 109-110 
defined, 103, 104, 117 
deviations in, 106 

direction, 105-106, 117 
displaying relationships with, 103-105 
examining, 106 

form, 106, 117 

illustrated, 104 

interpreting, 105-109 

outliers, 106, 117 

overall patterns, 105-106 
plotting values in, 104 
regression inference, 610 
residual plot, 140 

strength, 106, 117 

summary, 182 


Segmented bar graphs 


defined, 167 
illustrated, 169 
mosaic plot and, 169-170 


Shape. See also Distributions 


defined, 25, 35 


importance of, 28 
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Shape. See also Distributions (continued) 
population distribution, 418 
sampling distribution, 351, 353 

Significance level 
defined, 399 
in number of observations, 426 
overall, 30-7 
P-value and, 399 

Significance tests. See Tests of significance 

Simple conditions 
defined, 374 
Normal distribution, 374 
population standard deviation, 374, 416 
SRS, 374 
tests for population mean, 402 
verification of, 402 

Simple linear regression 
conditions for inference, 29-4 
Crunchlt! output, 29-41 
defined, 29-2 
multiple regression and, 29-57 
partial regression output, 29-35 

Simple Random Sample applet, 208, 209, 235 

Simple random samples (SRS) 
benefits of, 220 
bias and, 209 
binomial distributions and, 329 
as building block, 213 
choosing, 209 
concept, 207-208 
conclusions, 220 
as condition for inference about a mean, 456 
defined, 208, 219 
illustrated, 268 
labels, 208, 209 
population mean, 406 
population standard deviation, 406 
results, trusting, 209 
sampling distributions and, 386 
as simple condition, 374 
tests for population mean, 402 
using tables to choose, 208 

Simpson’s paradox 
defined, 171, 172, 173 
lurking variables and, 172 

Simulations 
defined, 350, 409 
obtaining sampling distributions without, 351 
significance from, 406-409 

Simultaneous tests, 30-7 

68-95-99.7 rule 
defined, 82, 93-94 
examples, 83-84 
illustrated, 82 

Skewed distributions 
defined, 25, 35 
help gained from, 30 
types of, 25 

Skewed to the left, 25 

Skewed to the right, 25 

Skewness, checking for, 470 

Slope 
defined, 129, 151 
least-squares, 619, 620 
least-squares regression line, 132, 135 
population regression lines, 612, 634 
regression, 622-624 


regression inference, 611 
regression line, 130, 151 
size of, 130 
straight lines, 129 
as zero hypothesis, 634 
Social science experiments, 262-263 
Special causes 
defined, 31-6 
s charts, 31-17 
x charts, 31-17 
Splitting stems, 31 
Spreadsheets 
defined, 15 
illustrated example, 15 
use of, 14 
Square of the correlation, 151 
Squared deviations, 59 
Squared multiple correlation coefficient (R’) 
defined, 29-12, 29-55 
form, 29-12 
interpretation with ANOVA table, 29-12 
using, 29-12 
Standard deviation of a density curve 
defined, 93 
locating, 93 
in Normal distributions, 81 
notation, 80 
as variability measure for Normal distribution, 
81 
Standard deviations 
ANOVA, 656, 657, 666 
binomial, 334-335, 339 
calculating, 57-58 
defined, 57, 66 
degrees of freedom, 58 
extreme observations and, 59 
F test for, 502 
formula, 58 
inference about, avoiding, 501-502 
multiple linear regression model, 29-28 
Normal distributions, 502 
as not a resistant measure, 59 
outliers and, 59 
pooled, 663 
population, 346 
process, 31-8 


regression inference, 611, 632 
resampling and, 473 
residual plots, 629 
sample, 346 
of sample mean, 353 
sample statistic, 473 
of sampling distribution, 353, 366, 518 
two-way ANOVA, 30-32 
units of measurement, 59 
use of, 59 
usefulness of, 59 
variability about the mean, 59 
Wilcoxon signed rank statistic, 28-17 
Standard distributions, 28-1 
Standard errors 
comparing two proportions, 520, 542, 551 
defined, 456, 472, 475 
least-squares slope, 619 
for mean response, 627 
in one-sample t confidence interval, 460 
regression, 613, 634, 29-55 


resampling and, 472-475, 476 
sample contrasts, 30-11 
of sample mean, 473-475 
of sample proportions, 520, 551 
two-sample t procedures, 489 
Standard Normal distribution, 85, 86, 94 
Standard Normal table 
defined, 88 
example, 88, 89-90 
to find Normal proportions, 89 
using, 88-91 
Standardized residuals, 642 
Standardized value, 85 
Standardized variable, 94 
Standardizing 
defined, 85, 93 
example, 85, 89, 90 
Statistical estimation 
law of large numbers and, 347-350 
reasoning of, 374-376 
Statistical inference. See also Confidence intervals; 
Tests of significance 
binomial distributions, 339 
chance and, 416 
checking conditions for, 381-382 
conclusions, 201, 439 
conditions for, 416-419 
data analysis before, 431 
data source and, 416 
defined, 7, 11, 211, 373, 439 
dropouts and, 417 
goals of, 275 
methods of, 561 
multiple regression, 29-14-29-20 
nonresponse and, 417 
outliers and, 418 
planning, 432 
population mean, 374, 415, 455-476 
population proportion, 517-532 
populations, 211-212 
in practice, 415-433 
probability ideas, 347-348 
purpose of, 201 
reasoning of, 560 
regression, 619-635 
reliability, 416 
shape of population distribution and, 418 
simple conditions, 374 
standard deviations, avoiding, 501-502 
two-way ANOVA, 30-20-30-27 
use of, 431 
variables, 453-564 
Statistical Power applet, 427 
Statistical problems 
conclusion, 7, 63, 66 
four-step process, 63 
organizing, 63-65 
planning, 7, 63, 66 
solving, 7, 63, 66 
stating, 7, 63, 66 
two-sample, 485-486 
Statistical process control 
comments on, 31-28-31-30 
common cause, 31-6 
control charts, 31-6 
defined, 31-6 
idea of, 31-6-31-7 


introduction to, 31-1-31-2 
natural tolerances, 31-29-31-30 
process focus, 31-28 
processes and, 31-2-31-6 
rational subgroups, 31-28-31-29 
special cause, 31-6 
Statistical significance 
alternative hypothesis and, 421 
defined, 238, 363, 399, 422 
example, 363 
findings, interpretation of, 399 
lack of, 432 
multiple analyses and, 423 
practical significance versus, 422 
P-value and, 396-400 
reasoning behind, 423 
sample size and, 421-422 
sampling distributions and, 361-366 
from a simulation, 406-409 
standard levels of, 421 
from a table, 404-405 
Statistical tests, reasoning of, 391-392 
Statistics 
in dealing with variation, 6 
defined, 2, 13 
elements of, 7 
learning, 8 
main ideas of, 8 
as science of data, 13 
Statistics in Summary flowchart 
defined, 561 
illustrated, 562 
in planning inference problems, 561 
treatment comparison and, 561 
Stemplots 
back-to-back, 43, 28-3 
classes, 30 
creating, 29 
creation example, 29-30 
defined, 35 
function of, 35 
illustrated, 30 
leaves, 29 
rounding, 30 
side-by-side, 646, 30-3 
splitting stems, 31 
stems, 29 
use of, 29 
Stems 
defined, 29 
reducing with rounding, 30 
splitting, 31 
Straight lines 
correlation and strength, 136 
equation, 129 
intercept, 129 
slope, 129 
Strata, 212, 219 
Stratified random samples, 212, 219 
Strength 
correlation measurement of, 114, 115, 116 
defined, 106, 117 
of relationships, 106, 117 
scatterplots, 106 
Subjects 
clinical trials, 260, 261 
controlling environment of, 231 


defined, 230, 244 
informed consent, 256 
in matched pairs design, 241 
randomized comparative experiments, 
237, 239 
Sum of squares for groups (SSG), 28-24 
Sums of squares 
defined, 663 
degree of freedom, 30-30, 30-31 
total, 30-29-30-31 
Symmetric density curves 
defined, 79 
mean for, 93 
median for, 93 
Symmetric distributions, 25, 35 


t approximation 
defined, 500 
degrees of freedom, 500 
details of, 499-501 
example, 500 
t confidence intervals 
Normal distribution and, 469-470 
output, 465 
technology and, 464-467 
t critical values, 458, 475, 506 
t distributions 
defined, 457, 475 
degrees of freedom, 457, 475 
density curves, 457 
mathematics associated with, 476 
P-values, 476 
variability, 457 
t procedures 
matched pairs, 467-469, 475 
one-sample, 459 
robustness of, 469-472, 476 
sample size and, 470 
scale of measurement and, 28-13 
two-sample, 489-494 
using, 470 
t statistic 
interpretation of, 457 
one-sample, 457, 475 
pooled, 501 
significance tests and, 475 
in testing slope is zero hypothesis, 634 
two-sample, 489, 506 
t tests 
individual, for coefficients, 29-16 
matched pairs, 28-29 
Normal distribution and, 469-470 
one-sample, 461-464 
output, 466 
regression slope, 634 
technology and, 464-467 
two-sample, 28-29 
Tables 
critical values, significance from, 404-405 
of random digits, 208, 245, 289 
two-way, 163-173, 577-580, 581-583 
Technology 
ANOVA, 650 
comparing two proportions, 543-545 
in describing distributions with numbers, 61-63 
in expected counts and chi-square statistic 


calculation, 585-586 
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Kruskal-Wallis test, 28-26-28-27 
in least-squares regression, 132-135 
parallel regression lines, 29-10-29-13 
regression inference, 615-618 
sampling and, 216-219 
t test and, 464-467 
Wilcoxon rank sum test, 28-8-28-10 
Wilcoxon signed rank test, 28-19 
Test statistic. See also t statistic; z statistic 
calculation of, 401 
chi-square, 583-584 
defined, 409 
Kruskal-Wallis, 28-24-28-29 
one-sample, 401, 409 
for SSG, 28-24 
Tests of significance 
basics of, 391-410 
cautions about, 421-424 
comparing two means, 506-507 
comparing two proportions, 545-548, 551 
concept of, 392 
defined, 391, 409 
four-step process, 401 
idea of, 440 
interpreting, 421-422 
for null hypothesis, 409 
for population mean, 400-404, 455 
for population proportion, 525-528, 531 
power of, 426-431, 432 
P-value, 396-400 
reasoning of, 392-394, 409 
for regression slope, 619 
robust, 470 
summary, 442-443 
t statistic and, 475 
trust of, 416 
Type I error, 429 
Type Il error, 429 
use of, 373 
Texas Instruments graphing calculator 
ANOVA, 650 
binomial probabilities, 332 
chi-square statistic, 585 
comparing two proportions, 543 
descriptive measures, 62 
expected counts, 585 
least-squares regression, 132 
regression inference, 616 
t confidence interval, 465 
t test, 466 
two-sample t procedures, 495 
Third quartile. See also Quartiles 
defined, 52, 66 
in five-number summary, 53 
Ties 
in Kruskal-Wallis test, 28-25 
in Wilcoxon signed rank test, 28-20-28-23 
in Wilcoxon sum rank test, 28-11-28-13 
Time plots 
cycles, 33, 35 
defined, 32, 33, 35 
example, 33 
illustrated, 33 
overall pattern, 33, 34 
time series data, 34 
trends, 34, 35 
use of, 32, 36 
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Time-series data, 34 
Total sum of squares 
defined, 30-29 
one-way ANOVA, 30-29 
two-way ANOVA, 30-29 
Treatments 
block design, 242 
in clinical trials, 261 
defined, 230, 244 
matched pairs design, 241-242 
no effect, null hypothesis of, 502 
randomized comparative experiments, 237 
studying effects of, 231 
Tree diagrams 
conditional probability and, 316 
defined, 314 
example illustration, 315, 317 
outcomes and, 316 
Trends, 34, 35 
Tukey pairwise multiple comparisons, 30-7, 
30-32, 30-33, 30-39 
Tukey simultaneous confidence intervals, 30-7 
Two-sample problems 
comparing two population means, 487-489 
comparing two proportions, 539-541, 551 
defined, 485, 506 
example, 486 
inference about standard deviations and, 
501-502 
occurrence of, 485 
pooled two-sample t procedures 
and, 501 
sample independence, 485 
study planning, 497 
technology and, 494-497 
Wilcoxon rank sum test, 28-2—28-6 
Two-sample t procedures 
defined, 490 
examples, 490-493 
output, 495 
pooled, avoiding, 501 
population variance and, 496 
practical options, 489-490 
robustness, 497-499, 507 
standard error, 489 
technology and, 494-497 
use of, 507 
Two-sample t statistic 
approximate distribution of, 500 
calculation of, 490 
defined, 489, 506 
degrees of freedom, 506 
exact distribution of, 499 
Two-sample t test, 28-29 
Two-sample x procedure, 547 
Two-sided alternative hypothesis, 394, 409 
Two-sided P-values, 398, 458 
Two-way ANOVA. See also ANOVA; One-way 
ANOVA 
ANOVA F statistic for, 30-31 
balanced designs, 30-14, 30-32 
characteristics of, 30-4 
conditions for, 30-14, 30-32 
crossed designs, 30-14, 30-32 
defined, 30-4, 30-32 
details, 30-28-30-32 
F tests, 30-28, 30-31, 30-32 


factors, 30-13, 30-28, 30-33 
follow-up analysis, 30-32, 30-33 
independent SRSs, 30-32 
inference, 30-20-30-27 
interpreting, 30-18, 30-33 
Minitab output, 30-22, 30-26 
Normal distribution, 30-32 
one-way ANOVA output comparison, 
30-28, 30-30-30-31 
standard deviation, 30-32 
strong interactions, 30-24 
total sum of squares, 30-29 
variation breakdown, 30-30 


Two-way tables 


chi-square test, 598 

column totals, 173 

column variables, 164, 173 
conditional distribution, 166-170 
defined, 167-168, 173 

expected counts in, 581-583 
illustrated, 164 

marginal distributions from, 164-165 
output, 168 

percent calculation, 165 

row totals, 173 

row variables, 163, 173 

Simpson's paradox and, 171-172 
working with, 599 


Type I errors, 429 
Type II errors, 429 


U 


ane 


ask ca: 


nbiased estimator 

defined, 353, 366 

of population mean, 353 
ncontrolled experiments, 233 
ndercoverage 

defined, 214, 219 

web surveys and, 217-218 
niform distribution, 290 

nits of measurement 

defined, 14, 35 


standard deviation, 59 


Values 


absolute, 28-16 

critical, 379, 386 

finding, given a proportion, 91-93 
in scatterplots, 104 


Variability 


about the mean, 59 

defined, 25, 35 

distributions, 501 

measures, choosing, 59-60 

measuring with quartiles, 51-53 

measuring with standard deviation, 
57-59 

in numerical summary, 66 

random samples and, 278 

sampling distribution, 352 

t distributions, 457 


Variables. See also Distributions 


average value versus typical value, 51 
background, in scatterplots, 108 
categorical, 14, 16-21, 35, 109-110 
column, 164, 173 

confounded, 229, 244 

defined, 3, 14, 34 


distribution of, 17, 35 
effects on another, 233 
explanatory, 101-103, 117 
indicator, 29-3-29-4, 29-55 
individuals and, 13-16 
lurking, 146-147, 152, 171-172 
negative association, 106, 107 
positive association, 107 
quantitative, 14, 21-32, 35 
random, 293-294 
relationships between, 3, 101 
response, 101-103, 117 
row, 163, 173 
standardized, 94 
Variance 
defined, 57, 66 
degrees of freedom, 58 
formula, 58 
Variation 
as everywhere, 5—6 
patterns behind, 5 
statistical tools for understanding, 6 
Venn diagrams 
defined, 304 
general addition rule, 308 
illustrated, 304 
Voluntary response samples 
defined, 207, 219 
as fundamental flaw, 418 
margin of error and, 420 


Web surveys 
defined, 217, 219 
problems with, 217 
undercoverage and, 217-218 
use of, 218 

Wilcoxon rank sum statistic 
continuity correction, 28-6 
defined, 28-6, 28-29 
Normal approximation, 28-6-28-8 
standardized, 28-6 


Wilcoxon rank sum test. See also Nonparametric 


tests 
defined, 28-4, 28-29 
hypotheses, 28-10-28-11 
Normal approximation, 28-11 
P-values, 28-8, 28-29 
rank sum statistic, 28-6 
ranks, 28-3-28-4 
same-shape condition, 28-10 
sum of ranks, 28-4 
technology and, 28-8-28-10 
ties in, 28-11-28-15 
two-sample problems, 28-2-28-6 
use of, 28-30 
Wilcoxon signed rank statistic. See also 
Nonparametric tests 
defined, 28-17, 28-29 
exact distribution of, 28-20 
mean, 28-17 
Normal approximation, 28-18-28-20 
standard deviation, 28-17 
Wilcoxon signed rank test 
defined, 28-17, 28-29 
P-values, 28-29 
ranks for positive differences, 28-17 
technology and, 28-19 


ties in, 28-20-28-23 
use of, 28-30 


Wording effects, 215 


x charts. See also Control charts 


center line, 31-8, 31-14, 31-24 
chart setup, 31-25-31-26 

control limits, 31-8, 31-14, 31-24 
creating, 31-8 

defined, 31-8, 31-37 

example, 31-9-31-10, 31-15-31-16 
illustrated, 31-16 


interpreting, 31-11-31-12 
out-of-control signal, 31-20 

for process monitoring, 31-7-31-13 
runs signal, 31-20-31-21 

special causes, 31-17 

using, 31-19-31-20 

using past data, 31-24, 31-26 


z confidence interval, 425, 432 
z procedures 

defined, 415 

example, 416-417 
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Normality basis, 418 
use of, 418, 440 

z statistic 
assumption, 409 
behavior of, 422 
confidence levels for, 404 
defined, 401 
one-sample, 401, 409 
for population mean, 401 
scale, 401 

z-score, 85, 93 


